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Preface 


Our aim in this book is to provide an introduction to the principles of algorithm 
design using a purely functional approach. Our language of choice is Haskell and all 
the algorithms we design will be expressed as Haskell functions. Haskell has many 
features for structuring function definitions, but we will use only a small subset of 
them. 

Using functions, rather than loops and assignment statements, to express algo¬ 
rithms changes everything. First of all, an algorithm expressed as a function is 
composed of other, more basic functions that can be studied separately and reused 
in other algorithms. For instance, a sorting algorithm may be specified in ferms of 
building a free of some kind and fhen flaffening if in some way. Funcfions fhaf build 
frees can be sfudied separafely from funcfions fhaf consume frees. Furfhermore, fhe 
properties of each of fhese basic funcfions and fheir relationship fo ofhers can be 
capfured wifh simple equafional properties. As a resulf, one can falk and reason 
abouf fhe ‘deep’ sfrucfure of an algorifhm in a way fhaf is nol easily possible wifh 
imperative code. To be sure, one can reason formally abouf imperative programs by 
formulating fheir specifications in fhe predicafe calculus, and using loop invarianfs 
fo prove fhey are correcf. Buf, and Ibis is fhe nub, one cannof easily reason abouf 
fhe properties of an imperative program direcfly in ferms of fhe language of ifs 
code. Consequenfly, books on formal program design have a quite differenl lone 
from Ihose on algorifhm design: fhey demand fluency in bolh fhe predicate calculus 
and fhe necessary imperafive dicfions. In confrasl, many lexis on algorifhm design 
Iradilionally presenl algorilhms wifh a step-by-step commenlary, and use informally 
slated loop invarianfs fo help one undersland why fhe algorifhm is correcf. 

Wifh a functional approach Ihere are no longer Iwo separate languages fo Ihink 
abouf, and one can happily calculate belter versions of algorilhms, or parts of 
algorilhms, by fhe slraighlforward process of equafional reasoning. Thai, perhaps, is 
fhe main conlribulion of Ihis book. Allhough if conlains a fair amounl of equafional 
reasoning, we have fried fo mainlain a lighl touch. The plain facl of fhe mailer is 
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that calculation is fun to do but boring to read - well, too much of it is. Although it 
does not matter very much whether imperative algorithms are expressed in C or Java 
or pseudo-code, the situation changes completely when algorithms are expressed 
functionally. 

Many of the problems considered in this book, especially in the later parts, begin 
with a specification of the task in hand, expressed as a composition of standard 
functions such as maps, filters, and folds, as well as other functions such as perms for 
computing all the permutations of a list, parts for computing all the partitions, and 
mktrees for building all the trees of a particular kind. These component functions 
are then combined, or fused, in various ways to construct a final algorithm with the 
required time complexity. A final sorting algorithm may not refer to the underlying 
tree, but the tree is still there in the structure of the algorithm. The notion of fusion 
dominates the technical and mathematical aspects of the design process and is really 
the driving force of the book. 

The disadvantage for any author of taking a functional approach is that, be¬ 
cause functional languages such as Haskell are not so well known as mainstream 
procedural languages, one has to spend some time explaining them. That would 
add substantially to the length of the book. The simple solution to this problem 
is just to assume the necessary knowledge. There is a growing range of textbooks 
on languages like Haskell, including our own Thinking Functionally with Haskell 
(Cambridge University Press, 2014), and we will just assume the reader is familiar 
with the necessary material. Indeed, the present book was designed as a companion 
volume to the earlier book. A brief summary of what we do assume, and an even 
briefer reprise of some essential ideas, is given in the first chapter, but you will 
probably not be able to learn enough about Haskell there to understand the rest 
of the book. Even if you do know something about functional programming, but 
not about how equational reasoning enters the picture (some books on functional 
programming simply don’t mention equational reasoning), you will probably still 
have to refer to our earlier book. In any case, the mathematics involved in equational 
reasoning is neither new nor difficult. 

Books on algorithm design traditionally cover three broad areas: a collection of 
design principles, a study of useful data structures, and a number of interesting and 
intriguing algorithms that have been discovered over the centuries. Sometimes the 
books are arranged by principles, sometimes by topic (such as graph algorithms, or 
text algorithms), and sometimes by a mixture of both. This book mostly takes the 
first approach. It is devoted to five main design strategies underlying many effective 
algorithms: divide and conquer, greedy algorithms, thinning algorithms, dynamic 
programming, and exhaustive search. These are the design strategies that every 
serious programmer should know. The middle strategy, on thinning algorithms, 
is new, and serves in many problems as an alternative to dynamic programming. 
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Each design strategy is allocated a part to itself, and the chapters on each strategy 
cover a variety of algorithms from the well-known to the new. There is only a 
little material on data structures - only as much as we need. In the first part of the 
hook we do discuss some basic data structures, hut we will also rely on some of 
Haskell’s libraries of other useful ways of structuring data. One reason for doing so 
is that we wanted the book not to be too voluminous; another reason is that there 
does exist one text, Chris Okasaki’s Purely Functional Data Structures (Cambridge 
University Press, 1998), that covers a lot of the material. Other books on functional 
data structures have been published since we began writing this book, and more are 
beginning to appear. 

Another feature of this book is that, as well as some firm favourites, it describes a 
number of algorithms that do not usually appear in books on algorithm design. Some 
of these algorithms have been adapted, elaborated, and simplified from yet another 
book published by Cambridge University Press: Pearls of Functional Algorithm 
Design (2010). The reason for this novelty is simply to make the book entertaining 
as well as instructive. Books on algorithm design are read, broadly speaking, by 
three kinds of people: academics who need reference material, undergraduate or 
graduate students on a course, and professional programmers simply for interest 
and enjoyment. Most professional programmers do not design algorithms but just 
take them from a library. Yet they too are a target audience for this book, because 
sometimes professional programmers want to know more about what goes into a 
good algorithm and how to think about them. 

Algorithms in real life are a good deal more intricate than the ones presented in 
this book. The shortest-path algorithm in a satellite navigation system is a good 
deal more complicated than a shortest-path algorithm as presented in a textbook 
on algorithm design. Real-life algorithms have to cope with the problems of scale, 
with the effective use of a computer’s hardware, with user interfaces, and with many 
other things that go into a well-designed and useful product. None of these aspects 
is covered in the present book, nor indeed in most books devoted solely to the 
principles of algorithm design. 

There is another feature of this book that deserves mention: all exercises are 
answered, if sometimes somewhat briefly. The exercises form an integral part of 
the text, and the questions and answers should be read even if the exercises are not 
attempted. Rather than have a complete bibliography at the end of the book, each 
chapter ends with references to (some of) the books and articles pertinent to the 
chapter. 

Most of the major programs in this book are available on the web site 


www.es.ox.ac.uk/publications/books/adwh 
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You can also use this site to see a list of all known errors, as well as report new ones. 
We also welcome suggestions for improvement, including ideas for new exercises. 
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What makes a good algorithm? There are as many answers to this question as there 
are to the question of what makes a good cookbook recipe. Is the recipe clear and 
easy to follow? Does the recipe use standard and well-understood techniques? Does 
it use widely available ingredients? Is the preparation time reasonably short? Does it 
involve many pots and pans and a lot of kitchen space? And so on and so on. Some 
people when asked this question say that what is most important about a recipe 
is whether the dish is attractive or not, a point we will try to bear in mind when 
expressing our functional algorithms. 

In the first three chapters we review the ingredients we need for designing good 
recipes for attractive algorithms in a functional kitchen, and describe tbe tools we 
need for analysing their efficiency. Our functional language of choice is Haskell, 
and the ingredients are Haskell functions. These ingredients and the techniques for 
combining them are reviewed in the first chapter. Be aware that the chapter is not 
an introduction to Haskell; its main purpose is to outline what should be familiar 
territory to the reader, or at least territory that the reader should feel comfortable 
travelling in. 

The second chapter concerns efficiency, specifically the running time of algo¬ 
rithms. We will ignore completely the question of space efficiency, for the plain 
fact of the matter is that executing a functional program can take up quite a lot of 
kitchen space. There are methods for controlling the space used in evaluating a 
functional expression, but we refer tbe reader to other books for tbeir elaboration. 
That chapter reviews asymptotic notation for stating running times, and explores 
how recurrence relations, which are essentially recursive functions for determining 
the running times of recursive functions, can be solved to give asymptotic estimates. 
Tbe chapter also introduces, albeit fairly briefly, the notion of amortised running 
times because it will be needed later in the book. 

Tbe final chapter in this part introduces a small number of basic data structures 
that will be needed at one or two places in the rest of the book. These are symmetric 
lists, random-access lists, and purely functional arrays. Mostly we postpone discus¬ 
sion of any data structure required to make an algorithm efficient until the algorithm 
itself is introduced, but these three form a coherent group that can be discussed 
without having specific applications in mind. 




Chapter 1 


Functional programming 


Haskell is a large and powerful language, brimming with clever ideas about how 
to structure programs and possessing many bells and whistles. But in this book we 
will use only a small subset of the host of available features. So, no Monads, no 
Applicatives, no Foldables, and no Traversables. In this chapter we will spell out 
what we do need to construct effective algorithms. Some of the material will be 
revisited when particular problems are put under the microscope, so you should 
regard the chapter primarily as a way to check your understanding of the basic ideas 
of Haskell. 


1.1 Basic types and functions 

We will use only simple types, such as Booleans, characters, strings, numbers of 
various kinds, and lists. Most of the functions we use can be found in Haskell’s 
Standard Prelude (the Prelude library), or in the library Data.List. Be warned that 
the definitions we give of some of these functions may not be exactly the definitions 
given in these libraries: the library definitions are tuned for optimal performance 
and ours for clarity. We will use type synonyms to improve readability, and data 
declarations of new types, especially trees of various kinds. When necessary we 
make use of simple type classes such as Eq, Ord, and Num, but we will not introduce 
new ones. Haskell provides many kinds of number, including two kinds of integer, 
Int and Integer, and two kinds of floating-point number. Float and Double. Elements 
of Int are restricted in range, usually [—2^^, 2^^) on 64-bit computers, though Haskell 
compilers are only required to cover the range [—2^^, 2^^). Elements of Integer are 
unrestricted. We will rarely use the floating-point numbers provided by Float and 
Double. In one or two places we will use Rational arithmetic, where a Rational 
number is the ratio of two Integer values. Haskell does not have a type of natural 
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numbers/ though the library Numeric.Natural does provide arbitrary-precision 
ones. Instead, we will sometimes use the type synonym 

type Nat = Int 

Haskell cannot enforce the constraint that elements of Nat be natural numbers, and 
we use the synonym purely to document intention. For example, we can assert 
that length \\ [a] —t Nat because the length of a list, as defined in the Prelude, 
is a nonnegative element of Int. Haskell also provides unsigned numbers in the 
Data.Word library. Elements of Word are unsigned numbers and can represent 
natural numbers n in the range 0 ^ n < 2^^ on 64-bit machines. However, defining 
type Nat = Word would be inconvenient simply because we could not then assert 
that lengthy, [a] Nat. 

Most important for our purposes are the basic functions that manipulate lists. 
Of these the most useful are map, filter, and folds of various kinds. Here is the 
definition of map: 

map:: {a^b) ^ [a\ —)■ [b] 

mapf[] =[] 

mapf {x: xs) =f x: mapf xs 

The function map applies its first argument, a function, to every element of its 
second argument, a list. The fanction filter is defined as follows: 

filter:: (a —)■ Bool) ^ [a] ^ [a] 
filter p[\ =[] 

filter p {x:xs) = if px then x '.filter p xs tist filter p xs 

The function yi/ter filters a list, retaining only those elements that satisfy the given 
test. There are various fold functions on lists, most of which will be explained in 
due course. Two of the important ones arefoldr and foldl. The former is defined as 
follows: 

foldr'.: (a —)■ —)■ fi) —)■ ^ [a] —>■ 

foldrf e[] =e 

foldrf e {x: xs) =f x (foldrf e xs) 

The function/oWr folds a list from right to left, starting with a value e and using a 
binary operator 0 to reduce the list to a single value. For example, 

foldr (0) e [x,y,z\ =x0 (y0 (z0e)) 

In particular,/oZr/r (:) [\xs = xs for all lists xs, including infinite lists. However, we 
will not make much use of infinite lists in what follows, except for idioms such as 

* In the documentation for the GHC libraries, there is the statement “It would be very natural to add a type 
Natural providing an unbounded size unsigned integer, just as Prelude.Integer provides unbounded size signed 
integers. We do not do that yet since there is no demand for it.” Maybe this book will create such a demand. 
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label:: [a\ —)• [{Nat^a)] 
label xs = zip [0.. ] 

As another example, we ean write 
length:: [a] —)■ Nat 

length =foldr succ 0 where succ xn = n + \ 

The seeond main funetion,/oZr/Z, folds a list from left to right: 

foldl :: {b — y a — y b'j — y b — y [rr] — y b 
foldlf e[\ = e 

foldlf e (x:xs) = foldlf (f ex) xs 
Thus 

foldl (©) e [x,y,z] = ((e©x)© 3 ^)©z 
For example, we eould also write 
length :: [a] —)■ Nat 

length = foldl succ 0 where succ nx = n + \ 

Note that foldl returns a well-defined value only on finite lists; evaluation oh foldl 
on an infinite list will never terminate. There is an alternative definition of foldl, 
namely 

foldlf e =foldr iflipf) e ■ reverse 
where yZZp is a useful prelude funetion defined by 

flip ::{a ^ b ^ c) ^ b ^ a ^ c 
flipf xy=fyx 

Sinee one ean reverse a list in linear time, this definition is asymptotieally as fast as 
the former. However, it involves two traversals of the input, one to reverse it and the 
seeond to fold it. 


1.2 Processing lists 

The difference between foldr and foldl prompts a general observation. When a 
programmer brought up in the imperative programming tradition meets functional 
programming for the first time, they are likely to feel that many computations seem 
to be carried out in the wrong order. Recursion has been described as the curious 
process of reaching one’s goal by walking backwards towards it. Specifically, lists 
often seem to be processed from right to left when the natural way surely appears to 
be from left to right. Appeals to naturalness are often suspicious, and appearances 
can be deceptive. We normally read an English sentence from left to right, but when 
we encounter a phrase such as “a lovely little old French silver butter knife” the 
adjectives have to be applied from right to left. If the knife was made of French 
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silver, but not necessarily made in France, we have to write “a lovely little old 
French-silver butter knife” to avoid ambiguity. Mathematical expressions too are 
usually understood from right to left, certainly those involving a chain of functional 
compositions. As to deceptiveness, the definition 

head =foldr (<C) -L where x <C y = x 

though a little strange is certainly correct and takes constant time. The evaluation 
offoldr (<C), conceptually from right to left, is abandoned after the first element is 
encountered. Thus 

head {x:xs) = foldr (<C) -L {x:xs) 

= X foldr (<C) -L xs 

= X 

The last step follows from the fact that Haskell is a lazy language in which eval¬ 
uations are performed only when needed, so evaluation of <C does not require 
evaluation of its second argument. 

Sometimes the direction of travel is important. For example, consider the follow¬ 
ing two definitions of concat: 

concati,concat 2 [[a]] —)■ [a] 
concat \ =/oWr (-H-) [] 
cone at 2 =foldl (-H-) [] 

We have concat\ xss = concat 2 xss for all hnite lists xss (see Exercise 1.10), but 
which dehnition is better? We will look at the precise running times of the two 
functions in the following chapter, but here is one way to view the problem. Imagine 
a long table on which there are a number of piles of documents. You have to assemble 
these documents into one big pile ensuring that the correct order is maintained, so 
the second pile (numbering from left to right) has to go under the first pile, the third 
pile under the second pile, and so on. You could start from left to right, picking up 
the first pile, putting it on top of the second pile, picking the combined pile up and 
putting it on top of the third pile, and so on. Or you could start at the other end, 
placing the penultimate pile on the last pile, the antepenultimate pile on top of that, 
and so on (even English words are direction-biased: the words ‘first’, ‘second’, and 
‘third’ are simple, but ‘penultimate’ and ‘antepenultimate’ are not). The left to right 
solution involves some heavy lifting, particularly at the last step when a big pile 
of documents has to be lifted up and placed on the last pile, but the right to left 
solution involves picking up only one pile at each step. So concat\ is potentially a 
much more efficient way to concatenate a list of lists than concat 2 - 

Here is another example. Consider the problem of breaking a list of words into 
a list of lines, ensuring that the width of each line is at most some given bound. 
This problem is known as the paragraph problem, and there is a section devoted 
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to it in Chapter 12. It seems natural to process the input from left to right, adding 
successive words to the end of the current line until no more words will fit, in 
which case a new line is started. This particular algorithm is a greedy one. There 
are also non-greedy algorithms for the paragraph problem that process words from 
right to left. Part Three of the hook is devoted to the study of greedy algorithms. 
Nevertheless, these two examples apart, the direction of travel is often unimportant. 

The direction of travel is also related to another concept in algorithm design, 
the notion of an online algorithm. An online algorithm is one that processes a list 
without having the entire list available from the start. Instead, the list is regarded 
as a potentially infinite stream of values. Consequently, any online algorithm for 
solving a problem for a given stream also has to solve the problem for every prefix 
of the stream. And that means the stream has to be processed from left to right. In 
contrast, an offline algorithm is one that is given the complete list to start with, and 
can process the list in any order it wants. Online algorithms can usually be defined 
in terms of another basic Haskell function scanl, whose definition is as follows: 

scanl :: (fi —)■ a —)■ fi) —)■ —)■ [a] ^ \b\ 

scanlf e[\ = [e] 

scanlf e {x'.xs) = e: scanlf (f ex) xs 

For example, 

scanl (©) e [v,y,z,...] = [e,e®x, {e®x) ©y, {{e®x) ©y) ©z,...] 

In particular, scanl can be applied to an infinite list, producing an infinite list as 
result. 


1.3 Inductive and recursive definitions 

While most functions make use of recursion, the nature of the recursion is different 
in different functions. The functions map, filter, wAfoldr all make use of structural 
recursion. That is, the recursion follows the structure of lists built from the empty 
list [ ] and the cons constructor (:). There is one clause for the empty list and another, 
recursive clause for x'.xsin terms of the value of the function for xs. We will call 
such definitions inductive definitions. Most inductive definitions can be expressed 
as instances oifoldr. For example, both map and filter can be so expressed (see the 
exercises). 

Here is another example, an inductive definition of the function perms that returns 
a list of all the permutations of a list (we call it perms^ because later on we will 
meet another definition, perms 2 )'. 

perms^W = [[]] 

perms I {x'.xs) = | y^ ^ perms i xs,zs ■(— inserts xys] 
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The permutations of a nonempty list are obtained by taking each permutation of 
the tail of the list and returning all the ways the first element can be inserted. The 
function inserts is defined by 

insertswa —)■ [a] ^ [[a]] 
inserts x[\ = [[-t:]] 

inserts x {y: ys) = {x: y: ys): map (y:) {inserts x ys) 

For example, 

inserts 1 [2,3] = [[1,2,3], [2,1,3], [2,3, Ij] 

The definition of permsi uses explicit recursion and a list comprehension, but 
another way is to use afoldr: 

perms I =foldr step [ [ ] ] where step x xss = concatMap {inserts x) xss 
The useful function concatMap is defined by 

concatMap :: {a —)■ [b]) —>■ [a] ^ [b\ 
concatMap f = concat ■ mapf 

Observe that since 

step X xss = {concatMap ■ inserts) x xss 
the definition of permsi can be expressed even more briefly as 
perms I =foldr {concatMap ■ inserts) [ [ ] ] 

The idiom foldr {concatMap ■ steps) e will be used frequently in later chapters for 
various definitions of steps and e, so keep the abbreviation in mind. 

Here is another way of generating permutations, one that is recursive rather than 
inductive: 

perms2 [] = [[]] 

perms 2 xs =[x'.zs\ {x,ys) t— picksxs,zs t— perms 2 y^] 
picks :: [a] —)• [(a, [a])] 
picks [ ] = [ ] 

picks {x'.xs) = {x^xs ): [(y,x:y5) | {y,ys) t— picks xs] 

The function picks picks an arbitrary element from a list in all possible ways, 
returning both the element and what remains. The function perms 2 computes a 
permutation by picking an arbitrary element of a nonempty list as a first element, 
and following it with a permutation of the rest of the list. 

The function perms 2 uses a list comprehension, but an equivalent way is to write 

perms2 [] = [[]] 

perms 2 xs = concatMap subperms {picks xs) 

where subperms {x,ys) = map (x:) {perms 2 ys) 
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Expressing perms 2 in this way rather than by a list comprehension helps with 
equational reasoning, and also with the analysis of its running time. We will return 
to both perms I and perms 2 in the following chapter. 

The different styles, recursive or inductive, of the definitions of basic combinato¬ 
rial functions, such as permutations, partitions, or subsequences, lead to different 
kinds of final algorifhm. For example, divide-and-conquer algorifhms are usually 
recursive, while greedy and fhinning algorifhms are usually inducfive. To appreciate 
fhaf fhere may be differenf algorifhms for one and fhe same problem, one has fo go 
back fo fhe definifions of fhe basic funcfions used in fhe specificafion of fhe problem 
and see if fhey can be defined differently. For example, the inductive definition of 
permsi leads to Insertion sort, while the recursive definition of perms 2 leads to Se¬ 
lection sort. These two sorting algorithms will be introduced in the context of greedy 
algorithms in Part Three. The general point is a key one for functional algorithm 
design: different solutions for problems arise simply because there are different 
but equally clear definitions of one or more of the basic functions describing the 
solution. 

While functional programming relies solely on recursion to define arbifrarily long 
compufafions, imperafive programming can also make use of loops of various kinds, 
including while and until loops. We can define and use loops in Haskell too. For 
example, 

until:: (a —)■ Bool) —)■ (a —)■ a) —)■ a —)■ a 
until pf x = i{px then x else until pf (f x) 

is a recursive definifion of fhe funcfion until fhaf repeatedly applies a funcfion fo a 
value unfil fhe resulf safisfies some condifion. We will encounter until again lafer in 
fhe book. Given until we can define while by 

while p = until {not-p) 

We can also define a functional version of simple for-loops in which a function is 
applied to an argument a specified number of times (see fhe exercises). 


1.4 Fusion 

The mosf powerful fechnique for consfrucfing efficienf algorifhms lies in our abilify 
fo fuse fwo compufafions together into one computation. Here are three simple 
examples: 

map f-map g =map{f-g) 

concatMap f ■ map g = concatMap if ■ g) 
foldr f e ■ map g = foldr if ■ g) e 
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The first equation says that the two-step proeess of applying one function to every 
element of a list, and then applying a second function to every element of the 
result, can he replaced hy a one-step traversal in which the composition of the two 
functions is applied to each element. The second equation is an instance of the first 
one, and the third is yet another example of when two traversals can he replaced hy 
a single traversal. 

Here is an another example of a fusion law, one for you to solve: 
foldrf e ■ concat = ???? 

Pause for a minute or so to try and complete the right-hand side. It is a good test of 
your understanding of the material so far. But do not he discouraged if you cannot 
find the answer, because it is not too obvious and many experienced functional 
programmers would fail to spot it. In a moment we will show how this particular 
fusion rule follows from one single master rule. Indeed, that is how we ourselves 
know the right-hand side, not by memorising it but by reconstructing it from the 
master rule. 


You probably paused for a short time, gave up and then read on. But you don’t 
get away that easily. Try this simpler version first: 

foldrf e {xs -H- y^) = ???? 

Having answered this question, can you now answer the first one? 


The answers to both questions will be given shortly. The master fusion rule is the 
fusion law offoldr. This law states that 

h {foldrf e xs) =foldr g {h e) xs 
for all finite lists xs provided 
h(f xy) =gx{hy) 

for all X and y. The proviso is called the fusion condition. Without one extra proviso, 
the restriction to finite lists is necessary (see the exercises). The proof of the fusion 
rule is by induction on the structure of a list. There are two cases, a base case and 
an induction step. The base case is 

h (foldrf e[\) 

= { definition offoldr } 

h e 

= { definition offoldr } 

foldrg{he) [] 

The induction step is 



1.4 Fusion 


13 


h (foldrf e {x'.xs)) 

= { definition offoldr } 

h (f X {foldrf e x^)) 

= { fusion condition } 

gx {h {foldrf e xs )) 

= { induction } 

g X {foldr g {h e) xs) 

= { definition of foldr } 

foldr g {h e) {x'.xs) 

This completes the induction and the proof of the fusion law of foldr. 

Returning to our two problems, the answer to the easier one is 

foldrf e (x5 4 + 3 ^ 5 ) = foldrf {foldrf e ys) xs 

For the more difficult one we have concat = foldr (+F) [], and the fusion law says 
that 

h {foldr (+F) [] X 55 ) = foldr g (/i []) xss 
provided g satisfies 

h {xs +F y^) = g xs {h ys) 

But h = foldr f e, and the solution to the easier problem says we can satisfy the 
fusion condition by taking g =flip {foldrf). Thus 

foldrf e ■ concat = foldr {flip {foldrf)) e 

over finite lists. Well done if you got it. 

Before ending the section, let us make a remark about styles of reasoning. The 
proof of the fusion rule for foldr was carried out at the point-level, meaning that all 
functions were fully applied to their arguments. It is also called point-wise reasoning. 
Contrast this with the following proof: 

mapf -filter p ■ concat 
= { distributing yiZter over concat } 

mapf ■ concat ■ map {filterp) 

= { distributing map over concat } 

concat ■ map {mapf) ■ map {filter p) 

= { property of map } 

concat ■ map {mapf -filterp) 

= { definition of concatMap } 

concatMap {mapf -filterp) 

This calculation is carried out at the function level using functional composition 
as the basic combining form. It is also called point-free reasoning (and sometimes 
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pointless reasoning by wags). When applicable, point-free reasoning is more attrac¬ 
tive than point-wise reasoning, if only because there are fewer parentheses to write. 
For this reason we often write things like 

h -foldrf e =foldr g [h e) 

without mentioning the list to which both sides are applied. However, as we said 
above, without an additional proviso tbe fusion law is true only for finite lists. For 
that reason we often state point-free equations with a rider “for all finite lists” or 
“over finite lists”. We did exactly this at the end of the previous section, though it so 
happens that the equation 

foldrf e ■ concat =foldr (flip (foldrf)) e 
is true for all lists, infinite as well as finite. 


1.5 Accumulating and tupling 

Sometimes a clean and simple definition has to be tweaked to make it efficient. Here 
is one rather artificial example. Given a list xss of lists of integers, consider the 
problem of concatenating the shortest prefix of xss whose total sum is positive. If 
no sum is positive, then the whole list is concatenated. Let collapse be the function 
that carries out this process, so for example 

collapse [[\],[-%[2,A]] =[1] 

[[-2,1],[-3],[2,4]] = [-2,1,-3,2,4] 
collapse [[-2M[3U2fl]\ =[-2,1,3] 

The simplest way to define collapse is in ferms of a helper funclion which accumu¬ 
lates fhe required prefix in ifs firsf argumenf: 

collapse V. [[/nt]] ^ [Int] 
collapse xss = help [ ] xss 
help xs xss = if sum > 0 V null xss then xs 

else help (xs -H- head xss) (tail xss) 

Ignore complefely whaf fhis particular funclion mighl be useful for and concenlrale 
only on fhe fad lhal collapse appears lo be doing a lol of work in recomputing sums. 
As each list is constructed in the first argument (the accumulating parameter) of 
help, its sum is recomputed from scratch. We can do better by tupling each list with 
its sum. Replace the definition of collapse with the following one: 

collapse xss = help (0, []) (labelsum xss) 
labelsum xss = zip (map sum xss 
help (s,xs) xss = if 5 > 0 V null xss then xs 

elseZicZp (cat (s,xs) (head xss)) (tail xss) 
cat (5,X5) (t,ys) = (5-Ft,x5-H-y5) 
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Each list is paired with its sum and this pairing is threaded through the computation. 
There are no sum computations in the revised definition of help but only in the 
function labelsum. There is, however, a single + operation in the definition of cat. 
In the worst case, the cost of computing these sums is now linear in the total length 
of the input, whereas with the previous definition it was quadratic. 

The remaining problem with collapse is the +|- operation in cat. The concatena¬ 
tions are performed from left to right. For example, 

collapse [[-5,3], [-2], [-4], [-4,1]] = 

((([]^[-5,3])^[-2])^[-4])^[-4,1] 

As we have seen, this is an inefficient way to concatenate lists. One way to solve 
the problem is to replace the accumulating list in the definition of help with an 
accumulating function: 

collapsexss = {help {0,id) {labelsumxss)) [] 

help {s,f) xss = if 5 > 0 V null xss then/ else help {s + tj ■ (x^-H-)) {tail xss) 
where {t,xs) = head xss 

At the end of the computation the accumulating function is applied to the empty list. 
For example, we now have 

collapse [[-5,3], [-2], [-4], [-4,1]] 

= (([-5,3]-^) • ([-2]^) • ([-4]^) • ([-4,1]^)) [] 

= [-5,3]^([-2]^([-4]^([-4,l]^[]))) 

The concatenation is now from right to left and is more efficient. This trick of using 
an accumulating function to achieve efficient concatenation will appear again at 
various places in the book. 

The general point we want to make with this example is the idea of tupling, 
whereby a computation can be made more efficient by tupling values of interest 
together and threading them through the computation. In that way such values do not 
have to be recomputed from scratch each time. Tupling is, in fact, a simple version 
of the idea of memoising the values of a function to save computing them more than 
once. Memoisation will be treated more fully in Part Five on dynamic programming 
algorithms. Generally, though not always, we will leave such tupling optimisations 
to the last stage in the design of an efficient algorithm because they add little by 
way of understanding and can clutter and obscure the code. Premature optimisation 
is the root of all evil in programming. We mention the tupling optimisation now 
because it is used a number of times in the final versions of some algorithms. 

The twin techniques of accumulating parameters and tupling are useful devices 
for improving the running time of algorithms, but the mother of all such devices is 
fusion. Practically every algorithm in this book benefits from fusion of some kind, 
so if you take away anything from this chapter, make sure to include the two central 
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principles of good algorithm design: (i) formulate the problem in terms of basic, 
well-understood ingredients; and (ii) fuse the components into a dish that is finally 
ready to leave the kitchen. 


1.6 Chapter notes 

The particular version of Haskell used in this book is Haskell 8.0, released in May 
2016. The website www.haskell.org shows you how to download the Haskell 
Platform, a bundled system with lots of useful libraries. In addition to a com¬ 
piler, the platform also provides an interpreter GHCi that we will use to illustrate 
some computations. The website also contains a wealth of material about Haskell, 
including a complete list of books on Haskell programming and a number of on¬ 
line tutorials about various aspects and features of the language. The wi ki hook 
en. wikibooks. org/wiki/Haskell also contains much useful information. Our 
own book on functional programming in Haskell [1] gives some help on how the 
use of a special function seq can be deployed to control the amount of space it takes 
to evaluate (some) functional expressions. 

The quote about premature optimisation being the root of all evil in programming 
is due to Don Knuth [4], though it may have appeared earlier. The phrase “a 
lovely little old rectangular green French silver whittling knife” was used by Mark 
Forsyth [2] to show that adjectives in English absolutely have to be in the following 
order: opinion, size, age, shape, colour, origin, material, purpose. Any deviation from 
this order just sounds wrong. However, the linguistic device known as hyperbaton 
changes the word order for rhetorical effect. 

The technique of using an accumulating function to change the order in which 
concatenations are performed was first written up in [3], though it was used earlier 
by a number of programmers. 
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Exercise 1.1 Here are some other basic list-processing functions we will need: 
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maximum, take, takeWhile, inits, splitAt, null, elem, zipWith, 
minimum, drop, dropWhile, tails, span, all, (!!) 

To check your understanding, just give appropriate types. 

Exercise 1.2 Trawling through Data.List we discovered the function 
uncons:: \a\ Maybe {a, [a]) 

of whose existence we were quite unconscious. Guess the definition of uncons. 

Exercise 1.3 The library Data.List does not provide functions 

wrap :: a —)■ [a] 
unwrap:: [a] —)■ a 
single ::[a]^Bool 

for wrapping a value into a singleton list, unwrapping a singleton list into its sole 
occupant, and testing a list for being a singleton. This is a pity, for the three functions 
can be very useful on occasions and will appear a number of times in the rest of this 
book. Give appropriate definitions. 

Exercise 1.4 Write down a definition of reverse that takes linear time. One possi¬ 
bility is to use afoldl. 

Exercise 1.5 Express both map md filter as an instance offoldr. 

Exercise 1.6 Express/oZr/r/ e-filter p as an instance offoldr. 

Exercise 1.7 The function takeWhile returns the longest initial segment of a list all 
of whose elements satisfy a given test. Moreover, its running time is proportional to 
the length of the result, not the length of the input. Express takeWhile as an instance 
of foldr, thereby demonstrating once again that & foldr need not process the whole 
of its argument before terminating. 

Exercise 1.8 The Data.List library contains a function dropWhileEnd which drops 
the longest suffix of a list all of whose elements satisfy a given Boolean test. Eor 
example 

dropWhileEnd even [1,4,3,6,2,4] = [1,4,3] 

Define dropWhileEnd as an insfance of foldr. 

Exercise 1.9 An alfernafive definifion of foldr is 

foldr f exs = \f null xs then e else/ {head xs) (foldr f e {tail xs )) 

Dually, an alfernafive definifion offoldl is 

foldlf exs = if null xs then e else/ (foldlf e {init x^)) {last xs) 
where last and init are dual to head and tail. What is the problem with this definition 
of foldn 
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Exercise 1.10 Bearing the examples 

foldr (©) e [x,y,z] =x© ( 3 ^© (z©e)) 
foldl {®) e [x,y,z] = ((e©x) © 3 ^) ©z 

in mind, under what simple eonditions on © and e do we have 
foldr (©) e xs =foldl (©) e xs 
for all finite lists xs7 

Exercise 1.11 Given a list of digits representing a natural number, construct a 
function integer which converts the digits into that number. For example, 

mfeger [1,4,8,4,9,3] = 148493 

Next, given a list of digits representing a real number r in the range 0 ^ r < 
1, construct a function fraction which converts the digits into the corresponding 
fraction. For example, 

fraction [1,4,8,4,9,3] =0.148493 

Exercise 1.12 Complete the right-hand sides of 

map {foldlf e) ■ inits = ???? 
map (foldr f e) ■ tails = 1111 

Exercise 1.13 Define fhe funcfion 
apply :: Nat (a^ a) ^ a^ a 
fhaf applies a function a specified number of times fo a value. 

Exercise 1.14 Can fhe function inserts associated wifh fhe inductive definilion of 
permsi be expressed as an insfance offoldrl 

Exercise 1.15 Give a definilion of remove for which 
perms^ [] = [[]] 

perms j, xs =[x:ys\x ^ xs^ys ■(— perms j, (remove x xs) ] 

computes fhe permulalions of a lisl. Is fhe firsl clause necessary? Whal is fhe lype 
of perms j,, and can one generate fhe permulalions of a lisl of functions wifh Ihis 
definition? 

Exercise 1.16 Whal exlra condition is needed for fhe fusion law of foldr lo be valid 
over all lisls, finile and infinile? 

Exercise 1.17 As slaled, fhe fusion law for foldr requires fhe proviso 
h(f xy) =gx(hy) 

for all X and y. The proviso is aclually loo general. Can you spol whal Ihe necessary 
and sufficienl fusion condition is? To help you, here is an example, admittedly 
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a rather artificial one, where a more restricted version of the fusion condition is 
necessary. Define the function replace hy 

replace x = if even x then x else 0 
We claim that replace -foldrf 0 =foldrf 0 on finite lists, where 

/ ::Int —)■ Int —)■ Int 
fxy=2xx+y 

Prove this fact hy using the more restricted proviso. 

Exercise 1.18 We referred to the fusion rule oifoldr as the master fusion rule, hut 
there is another master rule, the fusion rule for foldl. What is this rule? 

Exercise 1.19 Is the following statement true or false? “The original definition of 
collapse is more efficient than the optimised versions in the best case, when the first 
prefix has positive sum, because the sums of the remaining lists are not required. In 
the optimised version the sums of all the component lists are required.” 

Exercise 1.20 Find a definition of op so that 
concat xss = foldl op id xss [ ] 

Exercise 1.21 A list of numbers is said to be steep if each number is greater than the 
sum of the elements following it. Give a simple definition of the Boolean function 
steep for determining whether a sequence of numbers is steep. What is the running 
time and how can you improve it by tupling? 


Answers 


Answer 1.1 We have 


maximum, minimum 
take, drop 

takeWhile,dropWhile 
inits, tails 
splitAt 
span 
null 
all 
elem 
(!!) 

zipWith 


Ord a^[a\^ a 
Nat —^ ^ 

—)* Bool^ —)■ —)* 

[a] —)■ [[a]] 

Nat —)■ [a] 

(a —)■ Bool) [a] ([a], [a]) 

[a] ^ Bool 
{a —)■ Bool) —)■ [a] — Bool 
Eq a^ a^ Bool 

[u] —^ Nat —a 

{a —y b — y c) — y [u] —y [fi] —^ [c] 


In Haskell 8.0 some of these functions have more general types. For example. 


maximum, minimum :: {Foldable t, Ord a) ^ t a ^ a 
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The type class Foldable describes data structures that can be folded. For instance 
Foldable t contains a method/oWr of type 

foldrv. {a^b^b)^b^ta^b 

Lists are foldable and we will usefoldr only on lists. 

Answer 1.2 The function uncons is defined by 

uncons [ ] = Nothing 

uncons {x'.xs) = Just {x,xs) 

Answer 1.3 The simple definitions are 

wrap X = [x] 
unwrap [x] = x 
single [x\ = True 

single _ = False 

Note that head and unwrap are different functions. 

Answer 1.4 The definition is 

reverse:: [a] \a] 

reverse =foldl (flip (:)) [] 

Answer 1.5 We have 

mapf =foldrop[\ where op =/x 

fllter p =foldr op [ ] where opxxs = iipx then x: xs else xs 

Answer 1.6 We havefoldrf e-fllter p =foldr op e, where 
opxy = if px then/ xy else y 

Answer 1.7 We have 

takeWhile:: (a —t Bool) [a] [a] 

takeWhile p =foldr op [ ] where op x xs = if px then x: xs else [ ] 

For example, 

takeWhile even [2,3,4,5] 

= op2 (takeWhile even [3,4,5]) 

= 2: takeWhile even [3,4,5] 

= 2: op 3 (takeWhile even [4,5]) 

= 2 :[] 
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Answer 1.8 We have 

dropWhileEnd:: (a —)■ Bool) —)• [a] ^ [a] 
dropWhileEndp =foldrop [] 

where opxxs = i{px A null xs then [ ] else x: xs 

Answer 1.9 While 

head:: [a] —)■ a 
head {x :xs) =x 

tail:: [a] -A \a] 
tail {x: xs) = xs 

both take eonstant time, the dual funetions 

last:: [a] -A a 
last [x] = X 

last {x: xs) = last xs 

init:: [a] -A [a] 
init[x\ = [] 
init {x:xs) =x: init xs 

both take linear time because the whole list has to be traversed. That makes the 
alternative definition of foldl very inefficient. 

Answer 1.10 One simple condition is that © is associative operation with identity 
element e. For example, addition is an associative operation with identity element 0, 
sofoldr (+) 0 xs = foldl (+) 0 xs for all finite lists xs. 

Answer 1.11 For the function that converts a decimal to an integer, a definition by 
foldl is appropriate: 

integer = foldl shiftl 0 where shiftl nd=\Qxn + d 
For the function that converts a decimal to a fraction, a definition by foldr is 
appropriate: 

fraction = foldr shiftr 0 where shiftr dx = (fromintegral d + x)/\f) 

The use of fromintegral is necessary to convert a digit (an integer) to a floating-point 
number since division is not defined on infegers. 

Answer 1.12 We have 

map (foldl f e) ■ inits = scanlf e 
map (foldr f e) ■ tails = scanrf e 

where scanr is a prelude funclion, dual fo scanl. These resulfs are known collecfively 
as the Scan Eemma and can be very useful in text-processing algorithms such as 
those in Chapter 11. 
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Answer 1,13 There are two possible definitions: 
apply Of = id 

apply nf =f • apply {n-l)f 
apply Of = id 

apply nf = apply {n-\)f-f 

Functional composition is an associative operation, so the definitions are equivalent. 

Answer 1,14 Yes it can, although the definition is not immediately obvious: 

inserts x =foldr step [ [x] ] 

where step y yss = {x:y:ys): map (y:) yss 
where ys = tail {head 

This definition relies on the fact that head {insertsxys) = x:ys. 

Answer 1,15 The function remove removes the first occurrence of a given element 
in a given list: 

remove:: Eq a ^ a ^ [a] ^ [a] 
remove x[] = [] 

remove x (y: y^) = if x == y then y^ else y: remove x ys 

The first clause of perms^ is indeed necessary; without it we have perms^ [] = []. 
From this one can show that perms^ returns the empty list for all arguments. The 
type of perms^ is perms^ ::Eq a ^ [a] —)■ [[«]], so, no, one cannot generate the 
permutations of a list of functions using this definition since functions cannot be 
tested for equality. 

Answer 1,16 We would have to show the validity of fusion when the input is the 
undefined lisf _L. Since foldrf e _L = _L we require fhaf h has fo be a strict funcfion, 
refurning fhe undefined value if fhe argumenf is undefined. 

Answer 1,17 Taking fhe example firsl, fhe original fusion condition requires fhaf 
replace (2xx+y) = 2xx + replacey 
which is nof frue if y is odd. Buf we do have 

replace {f x {foldrf Oxs)) =/x {replace {foldrf Oxs)) 

because/oWr/ 0x5 is always an even number. The more general fusion law, which 
we will call context-sensitive fusion, is fhaf 

h {foldrf e xs) =foldr g {h e) xs 
for all finite lisfs xs provided fhaf 

h{f X {foldrf exs)) = gx {h {foldrf exs)) 

for all X and finile lisfs xs. Confexf-sensifive fusion will be needed in order fo show 
fhaf some problems can be solved by greedy algorifhms. 
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Answer 1.18 We have 

h (foldlf e xs) =foldl g {h e) xs 
for all finite lists xs provided that 
h (fyx) =g{hy)x 

for all y and x. The proof that this proviso is sufficient is by induction, but we have to 
be careful and first generalise the induction hypothesis by replacing e by an arbitrary 
value y: 

h (foldlf y xs) =foldl g (h y) xs 
Then we have 

h (foldlf y (x:xs)) 

= { definition offoldl } 

h (foldlf (fyx) xs) 

= { induction } 

foldlg (h (f yx)) xs 
= { proviso } 

foldl g (g (h y) x) xs 
= { definition oifoldl } 

foldl g (hy) (x:xs) 

The induction step would not be valid without the generalisation. 

Answer 1.19 No, it is false. Haskell is a lazy language in which only those values 
which contribute to the answer are computed. In the best case of collapse the 
remaining sums are discarded so they are never computed. 

Answer 1.20 We can take opfxsys=f though of course we cannot 

concatenate an infinite list of lists this way. 

Answer 1.21 A simple definition: 

steep [ ] = True 

steep (x:xs) =x> sum xs A steep xs 

This definition computes sum on every tail of the list. Since computation of sum 
takes linear time, computation of steep takes quadratic time. To obtain a linear-time 
algorithm we can tuple sum and steep, leading to the definition 

steep = snd -faststeep 
faststeep [ ] = (0, True) 

faststeep (x:xs) = (x + s,x> s A b) where (s, b) = faststeep xs 
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The secret of a successful algorithm, like the secret of successful comedy, is all 
about timing. In this chapter we review the tools needed for analysing the running 
times of functional algorithms and illustrate their use on one or two examples. 
The criteria for success should include space as well as time, hut analysing the 
space requirements of a functional algorithm can he a complicated process, so we 
will ignore it almost entirely. The three tools we need are: asymptotic notation 
for descrihing the growth of functions; recurrence relations for estimating running 
times; and the notion of amortised running times. 


2.1 Asymptotic notation 

Asymptotic notation is used to compare the order of growth of functions without 
worrying about the constants involved. There are three kinds of asymptotic notation: 
0, O, and Q. (‘big theta’, ‘big omicron’, and ‘big omega’). 

Let/ and g be two functions taking natural numbers as arguments and returning 
nonnegative results, not necessarily integers. We say that/ is of order g, and write 
/ = 0(g), if there are positive constants C and D and a number uq such that 

Cg{n) ^f{n) ^Dg{n) 

for all n>nQ. For example' 

n (n + l)/2 = 0(n^) 

r?+n^ + n\ogn =@{r?) 

n(l + l/2+l/3H -) = 0(n log n) 

The notation is abused to the extent that we write/(n) = 0(g(n)) rather than the 
more correct/ = 0(g). In particular, 0(1) stands for an anonymous function whose 
values lie between two positive constants. For instance, we can be confident that 

^ In this book, except where otherwise stated, logarithms are taken in base two, so log n means log2 n. 
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taking the head of a list is a constant-time operation. Exactly what this constant is 
does not really matter - well, as long as it is small. Instead we say that head takes 
0(1) steps. 

If we want only to put an upper hound on the values of a function, then we can 
use O notation. We say that/ is of order at most g, and write/ = 0{g), if there is a 
positive constant C and a natural number uq such that 

f{n) ^ Cg{n) 

for all n > uq- In particular, 0(1) stands for an anonymous function whose values 
are hounded above by some positive constant. For example, the running time of 
takeWhile on a list of length n is 0{n) steps, assuming the test takes constant time. 
In the worst case the running time is &{n) steps but in the best case, when the first 
element does not pass the test, the running time is 0(1) steps. 

A running time of 0{n^) does not imply that the running time is not also 0{n); 
so to claim, for example, that a sorting algorithm is inefficient because its running 
time is 0{n^) is mathematically illiterate. Instead we can use Q. notation. We say 
that/ is of order at least g, and write/ = f2(g), if there is a positive constant C and 
a natural number no such that 

f{n) ^ Cg{n) 

for all n ^ uq. It follows that/ = 0(g) if and only if/ = 0{g) and/ = f2(g). Use of 
Q. notation is therefore for putting lower bounds on the values of a function. It is 
legitimate to assert that a sorting algorithm is inefficient if its running time is Q.{n^) 
in the worst case. As we will see in due course, this statement is correct as well as 
legitimate, because there are sorting algorithms with superior running times. 

Of the three kinds of asymptotic estimate, 0 notation and O notation are the ones 
we will use the most. For example, we can estimate sums such as Y!k=i k = &{n^) 
and = 0(n^) without bothering about the exact constants involved. 

There are two dangerous bends one has to navigate with asymptotic notation. 
Firstly, the equality sign in / = 0(g) is not true equality with all the attendant 
properties that equality entails. For instance, = &{n^) and + n = &{n^) does 
not imply + n. With 0 notation the equality goes one way, from the sharper 

to the looser estimate. Another way to define 0 nofafion is fo say fhaf 0(g) denofes 
fhe set of all funclions/ wifh fhe sfafed properly, and lo wrile “/ G 0(g)” inslead of 
‘/ = 0(g)”. However, use of one-way equably ralher lhan inclusion is Iradilional 
and Ihere is no compelling reason lo break from if. Consequenlly, we will never 
wrile 0(g) =/. We can, however, wrile 

-|-0(n) = &(n^) 

for Ihis asserlion does make perfecl malhemalical sense. 
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The second danger concerns reasoning about asymptotic notation. We cannot 
reasonably assert, for example, that 

0(1) +0(2) H-|-0(n) = 0(n^) 

because the left-hand side does not have a clear meaning. But we can write 

£0(0 = 0(«^) 

i=\ 

because there is only one occurrence of 0 on the left and we assume that it stands 
for a single anonymous function of i. 

We say that the running time of an algorithm is linear if it takes &{n) steps for 
an input of size n, quadratic if it takes &{n^) steps, and so on. However, to claim 
that an algorithm is linear should, strictly speaking, mean that it takes this time in 
all cases. That is true for a function like reverse, but often the time taken differs in 
different cases. Use of 0 notation is therefore usually confined to one particular 
case. For instance, we might say that an algorithm takes 0(n^) steps in the worst 
case, but only 0(n) steps in the best case. Best- and worst-case times are rarely the 
same. Most often we concentrate on the worst-case running time of an algorithm, 
only occasionally mentioning the best case. 

There are two other measures of an algorithm’s performance: the time it takes in 
the average case, and a measure of how common the average case can be expected to 
be. For any average-case analysis we have to assume some probability distribution 
of the input values. For example, we could analyse the average case of a sorting 
algorithm by assuming that the input is a permutation of 1 to n and that all such 
permutations are equally likely. That may or may not be a sensible assumption 
to make. Average-case analysis is a fascinating subject and can involve some 
sophisticated mathematics, but in what follows we will almost always ignore this 
measure of running time. 


2.2 Estimating running times 

So far we have used but not defined the phrase “the running time of a function”. It 
is, of course, a function of the input size that measures the number of basic steps 
executed before the result is determined. The difficulty is that we simply do not 
know what the notion of a basic step means, at least not without looking closely at 
the details of a particular Haskell compiler and the architecture of the machine on 
which the algorithm is executed. The simplest alternative, and the one we will use, 
is to count reduction steps. Haskell evaluates an expression by reducing it to normal 
form and printing the result. At each step some reducible expression, or redex, is 
selected and simplified, by applying definitions supplied by the programmer or built- 
in operations like +. However, not all reduction steps take exactly the same time. 
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nor does counting them take into account the time required to find the next redex in 
a large and possibly complicated expression, and so the number of reduction steps 
is not a completely faithful measure of time. Alternatively, we could look at the 
elapsed time between the start and finish poinfs, buf again such a measure depends 
on the particular computer on which the function is evaluated. GHCi, a Haskell 
interpreter that comes bundled with the Haskell Platform, does provide statistics of 
performance if requested, including a measure of the elapsed time. Hugs, an earlier 
interpreter for Haskell, counted reduction steps. 

The second difficulty with estimating running times is that Haskell is a lazy 
language and evaluates expressions only as far as necessary to obtain the required 
value. For instance, in the evaluation off (g x) it may or may not be the case that 
g X is evaluated fully in order to determine the value computed by/. We will see an 
example or two of this phenomenon in the following section. However, in most of 
the algorithms in this book every subexpression will be fully evaluated at some point 
in the computation, so the laziness of Haskell is not critical for timing purposes. 
Instead we will assume eager evaluation for the purposes of timing. In particular, 
if Tg{n) is an estimate of the number of reduction steps required to compute g on 
an input of size n, for some appropriate definition of size, and g on such an input 
returns a value of size m, and Tf{m) is similarly the number of steps required to 
compute/ on an input of size m, then the running time T{n) of the computation of 
/ • g on an input of size n is given by 
T{n) = Tg{n)+Tf{m) 

Since lazy evaluation never requires more reduction steps than eager evaluation, any 
upper bound of the running time of a function under eager evaluation will also be 
an upper bound under lazy evaluation. 

In order to count, or at least estimate, the number of reduction steps in the evalua¬ 
tion of a recursive function, we need the idea of a recurrence relation. Associated 
with every recursively defined funcfion is anofher recursively defined funcfion for 
esfimafing fhe firsf funcfion’s running fime. The definifion of fhe second funcfion is 
usually referred to as a recurrence relation. Sometimes the relation is an equality 
(=), sometimes an inequality or ^), depending on whether we are seeking exact, 
upper, or lower bounds on the running time. To solve a recurrence relation means to 
find some way of expressing fhe funcfion involved in a closed form. 

For example, here is a simple recurrence relation for fhe running time T of some 
algorifhm as a function of fhe inpuf size n\ 

T{n) = T{n-l)+@{n) 

There is no real need fo sfafe fhe base case r(0) = 0(1). The solution is given by 

n 

T'{^) = X- 

k=0 
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The solution is not quite so obvious with the reeurrenee 

T{n)=lT{n-l) + @{n) (2.1) 

One way to solve (2.1) is to unfold it to see if a general pattern appears. Replaeing 
0(n) by cn to avoid tripping up over multiple 0s, we have 

T{n) = cn + 2T{n — \) 

= cn + l{c{n — \) +lT{n — l)) 

= cn + 2c(n — l)+4c(n —2)H-1-2”^^ c + 2”cr(0) 

and so 

n—\ 

r (n) = C £ 2*^ (n - A:) + 2" c r(0) 
k=0 

= c £A:2”-^ + 2”cr(0) 
k=i 

= crt^^,+rcT{o) 

k=i ^ 

= 0 ( 2 ") 

sinee L^=i^/2*^<2. Strietly speaking, the above ealeulation solves only the reeur¬ 
renee 

T{n) = 2T {n — \ ) + cn 

However, (2.1) ean be replaeed by two reeurrenee relations: 

T{n) ^2T{n — \)+C 2 n 
T{n) ^ 2r(n — 1) +ci n 

and the reasoning above can be repeated with inequalities instead of equalities, 
showing that T{n) = 0(n) and T{n) = 0{n), and thus T{n) = &(n). There is no real 
harm in replacing &(f(n)) by c xf{n), and we shall continue to do so. 

Next, consider the recurrence 

T{n) = nT{n — 1) + &{n) 

This time we have 


T{n) = cn + nT{n — \) 

= cn + n{c{n — \) + {n — \ )T{n — 2)) 

= cn + cn{n—\)-\-cn[n — \){n — 2.)-\ - 


Hence 


T{n) 


n 


= ^I 

k=\ 


n\ 

{n — k)\ 


&(n!) 


Later on, when we discuss divide-and-conquer algorithms, we will encounter other 
recurrence relations and show how to solve them. 
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Let us now look at some specific examples of how to time a Haskell function. 
Consider again the following two definitions of the function concat: 

concati xss =foldr (Tf) [] xss 
concat 2 xss = foldl (Tf) 

Since Tf is an associative operation with the empty list as identity element, these 
two definitions are equivalent provided xss is a finite list. Let Ti{m,n) and T 2 {m,n) 
denote asymptotic estimates of the running times of the definitions when xss is a list 
of length m consisting of lists each of length n. The total length of xss is therefore 
mn. Under this assumption, the worst-case and the hest-case running times coincide, 
so we can give asymptotic estimates without focusing on different cases. 

To estimate Ti, it is best first to rewrite the definition of concat\ in explicitly 
recursive terms: 

concati [] = [] 

concati {xs'.xss) = xs -{]- concati xss 

The recursive definition of concati leads to the following definition of Tj: 

ri(0,n) =0(1) 

Ti (m + l,n) = Ti{m,n) + C{n,mn) 

where C{m, n) is the time taken to concatenate a list of length m with a list of length 
n. Here the base case is necessary. The cost of -H- is proportional to the length of the 
first argument, so C{m,n) = &{m). That means we can replace the second equation 
in the definition of Ti by 

Ti(m + l,n) = Ti{m,n) + &{n) 

Now we have 

m 

Ti{m,n) = ^ &{n) = &{mn) 
k=o 

The running time of concati is therefore linear in the total length of the input, the 
best we can expect. Turning to concat 2 , we again first rewrite the definition as an 
explicitly recursive function: 

concat 2 xss = step [ ] xss 
step W5 [ ] =ws 

step ws {xs : xss) = step (wi -H- xs) xss 
Here step is an abbreviation for foldl (-H-). That leads to the recurrence 

T 2 {m,n) =S{0,m,n) 

S{k,0,n) =0(1) 

S{k,m+ l,n) = S{k + n,m,n) + &{k) 

where S{k,m,n) is the cost of evaluating step ws xss when ws has length k, and &{k) 
accounts for the -H- operation in the recursive definition of step. We have 
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m—1 

S{k,m,n) = ^ &{k+jn) = &{km + m^n) 
j=0 

and so T 2 {m,n) = &{m^ n). The running time of concat 2 is therefore not linear in the 
total length of the input. Experiments with GHCi confirm the difference in running 
times: 


> sum $ concat2 (replicate 2000 (replicate 100 1)) 

200000 

(2.84 secs, 14,502,482,208 bytes) 

> sum $ concatl (replicate 2000 (replicate 100 1)) 

200000 

(0.03 secs, 29,292,184 bytes) 

By the way, experiments such as these are useful guides for guessing the running 
time of a function. Try running the function on inputs of sizes n, 2n,4n, and so on. 
If the elapsed time doubles at each run, then the function takes linear time; if the 
elapsed time is greater by a factor of four at each step, then the function probably 
takes quadratic time; and if the time is greater by a factor of eight, then the function 
takes cubic time. And so on. 

Here is another example. Consider again the two permutation-generating func¬ 
tions of the previous chapter: 

perms I = foldr {concatMap ■ inserts) [[]] 
inserts x[] = 

inserts x {y: ys) = {x: y: ys): map (y:) {inserts x ys) 

perms2 [] = [[]] 

perms 2 xs = concatMap subperms {picks xs) 

where subperms {x,ys) = map {x:) {perms 2 ys) 
picks [ ] = [ ] 

picks {x:xs) = {x,xs): [{y,x:ys) \ {y,ys) •(— picks xs] 

Let T\ {n) and T 2 {n) be the running times of perms i and perms 2 on a list of length n. 
The recurrence relation for Ti satisfies 

T\{n+ 1) = T\{n) +n\ {I{n) +@{n'^)) 

The function I{n) is the time to compute the list of insertions of a new element in 
a permutation of length n. There are n+l results, each of which is a list of length 
n + \, and it takes 0(n^) to concatenate them. Finally, there are n \ permutations of 
a list of length n, so the insertions are computed n\ times. Now 

l{n+ 1) = l{n) -|-0(n) 

There are n -|- 1 ways to add a new element to a list of length n, so it takes 0(n) steps 
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to perform the map operations. That gives I{n) = 0(n^), and I{n)n\ = &{{n + 2)!), 
so we have 


Ti{n+l)=Ti{n) + &{{n + 2)\) 
Hence 

n—1 n+1 

ri(n) = £0((fc + 2)!) = £0(^!) 

k=0 k=2 


But 




k=Q 


= n\ 


1 1 

H-1-7-TV 

n n[n—\) 



0(n!) 


Therefore T\{n) = 0((n + 1)!) and so the running time of perms ^ is proportional to 
the total length of the output, namely nxn\. 

Turning to T 2 {n), we have the recurrence relation 


T 2 {n) = P{n) +n{T 2 {n — 1) +0(n!)) +0((n + 1)!) 

where P{n) is the time taken to compute picks. The second term accounts for the 
total time spent computing n evaluations of subperms, and the final term is the cost 
of the concat operation, which takes linear time in the length of the output. The 
function P{n) satisfies 

P{n) = P{n — 1) + 0(n) 


where the 0(n) term accounts for the map operations implicit in the list comprehen¬ 
sion. Thus 


T 2 {n) =nT 2 {n — \) + ®{{n + \)\) 

To solve this recurrence we can guess that T 2 {n) takes the form 
T 2 {n) =f{n)n\ 

for some function/. Then we have 

f{n)n\ = nf{n —l)(n—l)! + 0((n+l)!) 

which on division by n\ gives/(n) =f{n — 1) +0(n). Hence/(n) = 0(n^) and 
Tiin) = 0((n + 2)!). Thus the running time of perms 2 is a factor of n greater than 
the running time of perms i- 


2.3 Running times in context 

As we said above, Haskell is a lazy language that evaluates expressions only as far 
as necessary to obtain the answer. In this section we take a brief look at some of the 
consequences of lazy evaluation. The material is not necessary for understanding 
the rest of the book, so it can be skipped, especially at a first reading. 

The major point at issue is that we have to be careful about what running time 
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we actually assign to a given function when it occurs in a particular context. For 
instance we have seen that evaluation of concati on a list of length m of lists all 
of length n takes &{mn) steps. However, evaluating head -concati does not take 
this time. In fact, evaluation of head ■ concati on a nonempty list of nonempty lists 
proceeds essentially as follows: 

head {concati {{x:xs) :xss)) 

= head {{x'.xs) -Vr concati xss) 

= head {x : {xs -H- concati xss)) 

= X 

There are only 0(1) reduction steps. The running time of head ■ concat 2 is, however, 
0(m) steps because it takes this time to reduce 

(([] -H"V5i) -{j-XS2) +1“ • • • -^-XSm 

to +1-before the head of xsi can be extracted. 

Here is another instructive example. Consider the functions inits and tails: 
inits,tails:: [a] [[a]] 

inits[\ =[[]] 

inits (x:x5) = [] :map (x:) {initsxs) 

tails [ ] = [ [ ] ] 

tails {x: xs) = {x: xs): tails xs 
For instance 

inits "abed" = "a", "ab", "abc", "abed"] 

tails "abed" = ["abed", "bed", "ed", "d",""] 

The running times I{n) and T{n) of inits and tails satisfy 

I{n+l) =I{n)+&{n) 

T{n + \)=T{n) + e{\) 

with solutions I{n) = &{n^) and T{n) = &{n). However, for both functions there 
are &{n^) symbols in the output for a list of length n, so it takes this time to print 
the result. The difference between the running times of inits and tails emerges when 
we want to produce some function of the final list rather than the final list itself. 
For instance one can count the number of suffixes in linear time, but it requires 
quadratic time to count the prefixes: 

length $ tails [1..10000] 

10001 

(0.00 secs, 2,407,104 bytes) 
length $ inits [1..10000] 

10001 

(1.39 secs, 5,901,977,160 bytes) 
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To compute the length of a list one only has to know the number of elements, not the 
values of the elements themselves. The much greater space required in the second 
evaluation is due to the fact that evaluation of inits builds up a long chain of maps: 

[[], 

map (1:) [[]], 

map (1:) {map (2:) [[]]), 

map (1:) {map (2:) ...map (10000:) [[]])] 

The values of these elements are not needed for computing the length of the result, 
but nevertheless the unevaluated expressions have to be stored, which is why the 
total amount of space required is about the square of the space needed for counting 
the number of suffixes. We will return to inits and tails in the following chapter. 

Here is one more example. Consider the cost of computing length ■ permsi, where 
permsi was defined in the previous section. Since only the number of permuta¬ 
tions has to be found, no expression denoting an individual permutation has to be 
evaluated and the running time, L{n) say, satisfies the recurrence relation 
L{n +1) = L{n) +n\I{n) 

where I{n) = &{n^), the same as before. The recurrence has the same asymptotic 
solution as before, so it takes 0((n + 1)!) steps to compute the number of permuta¬ 
tions. This time can be brought down to &{n\) steps by redefining inserts. The trick 
is to use an accumulating function: 

inserts x = help id x 

helpfx[\ =\f[x]] 

helpf x{y\ys) =f {x: y: ys): help (f-{y:))xys 
For example, 

help id 1 [2,3] 

= id [1,2,3]-.help (2:) 1 [3] 

= /r/[l,2,3]:(2:)[l,3]:fieZp((2:).(3:))l[] 

= /r/[l,2,3]:(2:)[l,3]:(2:) ((3:) [!]):[] 

It now takes only &{n) steps to count the number of insertions, so 
L{n+ 1) = L(n) -|-0((n-|- 1)!) 

with the solution L{n) = &{n\). However, the total running time Ti{n) of permsi is 
not affected by this change. 

2.4 Amortised running times 

Sometimes the total cost of a sequence of n operations is 0{n) steps even though 
the cost of an individual operation is not 0(1). Consider the function 
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build:: (a —)■ a — Bool) —)■ [a] —)■ [a] 

build p =foldr insert [ ] where insert xxs = x: dropWhile (p x) xs 

For example, build (==) removes adjaeent duplieates from a list: 

build {==) [4,4,2,1,1,1,2,5] = [4,2,1,2,5] 

The running time I{n) of insert on a list of length n is 0{n) steps, assuming eval¬ 
uation of p takes constant time. Hence the running time B{n) of build on a list of 
length n satisfies 

B{n + 1) = B{n) + 0{n) 

with solution B{n) = 0{n^). The running time of insert is certainly not constant 
because dropWhile can take Q.{n) steps when applied to a list of length n. It therefore 
appears that the assertion B(n) = 0{n^) is the best we can say about the running 
time of build p. 

In fact the running time of build p is 0{n) steps, not just 0{n^) steps. To see 
why, observe that each element added to the list can be dropped at most once. Thus 
the total number of elements that can be dropped is at most the total number of 
elements that can be added, namely n. That gives a total running time of 0{n) steps. 
The amortised cost of a single operation is obtained by dividing the total cost of 
the operations by the number of such operations, namely n. The amortised cost of 
insert in the computation of build is therefore 0(1) steps. Note that no assumption 
about probability distributions is involved in this analysis. 

As another example, consider the following function which increments a binary 
integer a given number of times: 

bits::Int^ 

bits n = take n {iterate inc []) 
where inc[] = [ 1 ] 
inc (0: bs) = l:bs 
inc (1: bs) = 0: inc bs 

The Standard Prelude function iterate generates an infinite list: 

iterate:: (a^ a) ^ a ^ [a] 
iterate/x = x: iterate/ (/x) 

The function inc increments a binary integer, written in reverse order with the least 
significant bit first: for example, inc 101 =011 and inc 111 = 0001. How long does 
it take to compute bits nl Since the running time of inc on a list of length k is Q.{k) 
in the worst case (when all the bits are 1), it seems that the best we can say about the 
total cost of n increments is that it takes 0{n^) steps. Certainly it takes this time both 
to compute and print the result, but we are concerned only with the computation 
time. In fact the cost is 0{n) steps. To see why, observe that in half of the cases only 
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the first bit is changed; in a quarter of the cases only the first two bits are changed; 
in an eighth of the cases only the first three bits are changed; and so on. Taking the 
cost of changing a bit to be 1 step, the total cost is 


n 2n 3n 

2 + T + T + ' 


= 0{n) 


The amortised cost of each increment is therefore 0(1) steps. 

The most important use of amortised costs occurs when building data structures 
of various kinds. We will see examples in the following chapter. Typical operations 
on data structures include inserting an element into the structure, deleting an el¬ 
ement, and perhaps merging two structures in some way. When each individual 
operation is considered in isolation, upper and lower bounds can be given for the 
cost of the operation. However, when some computation involves a sequence of n 
such operations, whether they are grouped together or distributed throughout the 
computation, the total cost as a function of n may be lower than the sum of the 
individual estimates of the costs of each operation in the sequence. 

The amortised costs of build p and a sequence of inc operations were obtained 
using different insights, but there is a more uniform way of computing amortised 
costs, not restricted to costs that are 0(1). We will need this method in the following 
chapter. To do so we first change the cost model to one that counts the costs of 
operations in terms of definite integers rather than asymptotic notation. For example 
with build p we can charge a cost of 1 for each evaluation of p (remember p is 
assumed to be a constant-time operation) and 1 for each cons operation. The actual 
running times are proportional to these costs. Similarly, when a bit sequence begins 
with exactly t bits set to 1, the cost of inc is defined fo be t -|- 1. 

Now suppose n successive applicafions of some funcfion/ applied fo xq produces 
a sequence of values xo,xi, ...,x„. Lef C(x,) be fhe cosl of computing/ on inpuf x, 
and A(x,) fhe amortised cosl. The aim is lo show 


£c(xO^ £a(x,) (2.2) 

i=0 1=0 

In particular, if A(x,) = 0(1), fhen fhe fofal cosl of n operations is 0(n). 

To eslablish (2.2) we conslrucl a function S, a ‘size’ funcfion lhal relums nonneg- 
alive inlegers, and show for some appropriale definition of A lhal 


C(x/) ^ S(x,-)-S(xi+i) +A(x/) 


(2.3) 


for 0 ^ / < n. In words, fhe cosl off on an inpul is bounded above by fhe difference 
in sizes belween fhe inpul and oulpul, plus fhe amortised cosl. The inequality (2.2) 
can be summed, giving 

n—1 n—1 

C(Xi) ^ ^(xo) - S{Xn) -f Y, 
i=0 i=0 
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so (2.2) is certainly satisfied if ^(xo) = 0. 

Here are our two examples again. In the case of build p we take S{xs) = length xs. 
Let C{xsi) be the cost of computing counting a cost of 1 for each p or cons 
operation. We have 

C{xsi) = length xsi — length + 2 

The cost is positive because if the output is longer than the input, it can only be so 
by one element. Hence we can take A{xs) = 2. 

In the case of inc we can define S{bs) fo be the number of bits in bs that are set 
to 1. If there are b\ bits set to I before an inc operation, including t initial bits, and 
b 2 bits set after the operation, then b 2 = b\—t+\, equivalently t+\ = bi—b 2 + 2. 
Hence 


C{bsi) = S{bsi)-S{bsi+\)+2 
and so A{bs) = 2. 

Here is one more example. Consider the function 
prune :: ([a] ^ Bool) —)■ [a] —)■ [a] 

prune p =foldr cut [ ] where cut xxs = until done init {x : xs) 

done xs = null xsV pxs 


The value of cut x xs is the result of repeatedly dropping the last element o^x^.xs 
until a list satisfying p is obtained. For example, if ordered is the test for whether a 
sequence is in ascending order, we have 

prune ordered [3,7,8,2,3] = [3,7,8] 


This is obviously a very silly way to find the longest ordered prefix of a lisf, but 
never mind - only the running time is of interest. If evaluation of p on a list of 
length k takes 0{k) steps, then evaluation of cut on a list of length k takes 0{k^) 
steps. So it seems that the best we can say is that prune applied to a list of length n 
takes 0{n^) steps. In fact, prune takes 0{n^) steps, which means that the amortised 
running time of cut is 0{n) steps. 

To see why, suppose the result of cut on a list of length k\ is a list of length k 2 , 
where 0 ^ ^2 ^ + L Suppose we charge evaluation of each of done and init on a 

list of length kn&k units. Then init is performed ^] + 1 — ^2 times with a total cost 
of 


(^l + l)+^l + (^l — 1) + -- - + (^2 + 1) 


(^1 + 1) (^1+2) ^2(^2 + !) 

2 2 


Since done is performed one more time than init, for a cost of k 2 , its total cost is 
(^1 + 1 ) (^ 1 + 2 ) k2{k2-l) 

2 2 


Summing these two quantities and adding in 1 unit for the cons operation, we have 
that the total cost of cut on a list of length k\ is 
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{k\ -\-\) (k\ -{-X) — /cj — A:2+3 (^i + 1) 

We can therefore take 

S{xsi) = {length xsiY 
A{xsi) = 3 X {length xsi + 1) 

to satisfy (2.3). But no list can have length greater than n, so = 0{n) and the 

total cost of prune on a list of length n is 0{n^) steps. 

2.5 Chapter notes 

As will be appreciated, a fair amount of combinatorial mathematics is involved in 
the analysis of running times. We have mentioned sums and factorials, but later on 
we will also need floors and ceilings, modulus operations, and binomial coefficients. 
The best source book for these concepts is [1]. The history of asymptotic notation is 
discussed in [2] and also appears in [3]. To complete a quartet of books authored 
by Donald Knuth, the inventor of the name ‘Analysis of Algorithms’, we also 
recommend [4]. 

There are a number of algorithms for generating permutations, and a com¬ 
prehensive review can be found in Section 7.2.1.2 of [5], yet another book by 
Knuth. The method of choice used in Data.List is one that achieves maximal 
laziness. For a fascinating discussion of this rather complicated definition, see 
http://stackoverflow.com/questions/24484348/. 
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Exercises 

Exercise 2.1 Is the assertion/(n) = 0(1) the same as the assertion/(n) = 0(1)? 

Exercise 2.2 Are the following two assertions true? 

0{f{n) X g{n)) =f{n) x 0{g{n)) 

0{f{n) +g{n)) =f{n) -f 0{g{n)) 
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Exercise 2.3 Prove formally that (n + 1)^ = 0(«^) by exhibiting the necessary 
constants. 

Exercise 2.4 What are the exact values of the sums Y!k=\ ^ Y!l=i 

Exercise 2.5 Some of the following are correct and some are wrong. Which are 
which? 

IrP' + 3n = &{n^) 

2n^ + 3n = 0{n^) 
n log n =0{n\f^ 
n + y/n =0{^/n\ogn) 

i:LiiA = 0(iog«) 

2 logn =0{n) 
log(n!) = 0 (n log n) 


Exercise 2.6 Sums of the form kx^ for various x come up surprisingly often 
in the analysis of running times. One way of finding the solution is to start with the 
simpler geometric series 



l-x”+i 
1 —X 


which is valid provided x / 1, and to differentiate both sides with respect to x. Using 


the fact that the derivative of a sum is the sum of its derivatives, carry out this 
differentiation and hence estimate the sums T!k=o^/^^- 


Exercise 2.7 Using 0 notation, estimate the sum Yk=i ^ log 


Exercise 2.8 Solve the recurrence relation 

r( 0 ,n) =0(n2) 

T{m,n) = T(m — l,n) + 0(m) 


Exercise 2.9 Use the fusion law offoldr to simplify head ■ concati. Can the fusion 
law of foldl be used to simplify head ■ concatY^. 

Exercise 2.10 Analyse the running time of perms^, where 
perms^ [] = [[]] 

permsj xs = concatMap subperms xs 

y/here subperms X = map (x:) (perms^ (remove xxs)) 


Exercise 2.11 Do permsi, perms 2 , and perms^ all return the same first list? If so, 
what is it? 


Exercise 2.12 Can the trick of using an accumulating function work with initsl 
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Exercise 2,13 Suppose you are given a list of n digits and you want to find the 
position, reading from left to right, of the first digit d for which J ^ 5. If no such 
digit occurs, then the result is some negative position, say —1. In the best case the 
algorithm examines one digit and in the worst case all n digits. Assuming that every 
sequence of n digits is equally likely, what is the average number of digits that have 
to be examined? 

Exercise 2.14 Using the function iterate, give a one-line definition of the function 
tailsl that returns the nonempty suffixes of a list. 

Exercise 2.15 Consider the problem of maintaining a dynamic array. Apart from 
inspecting and updating the elements, suppose that new elements can be added to 
the array but only at the front. At some point the array, which is of a fixed given size, 
can become full. To solve this problem, a new array of double the size of the old 
one can be allocated and all the existing elements copied into the upper half of the 
new array, leaving space for further additions in the bottom half. Then we can carry 
on until the new array becomes full, in which case the process is repeated. Show 
that, in a sequence of add operations, each addition has an amortised cost of 0(1). 


Answers 

Answer 2,1 Strictly speaking, no. We have/(n) = 0(1) if there is a positive con¬ 
stant C and an integer uq such that/(n) ^ C for all n > hq, while /(n) = &{n) if 
there are positive constants C and D and an integer hq such that D ^f{n) ^ C for 
all n>no. Hence the function const 0 is 0(1) but not 0(1). 

Answer 2.2 No, the first one is true, but the second one is false. For example, take 
f{n) = n^ and g{n) = 1. Then h{n) = 0{n^ + 1) holds if there exists a C and no such 
that h{n) ^ C (n^ -f 1) for all n > no, but it does not follow that that there exists a C 
and no such that h{n) ^ n^ -f C for all n > no. 

Answer 2.3 We have n^ ^ (n -|- 1)^ ^ 4n^ for all n > 0. The second inequality 
follows from the fact that 3n^ — 2n — 1 = (3n-|-l)(n — 1), which is nonnegative if 
n ^ 1. 

We have 
n (n -f 1) 

■ 2 

n(n-|-l)(2n-|-l) 

6 

Answer 2.5 They are all true except for n + ^/n = 0{^/n log n). The last one, 
log (n!) = 0(n log n), is a crude form of Stirling’s approximation, which states that 


Answer 2.4 

n 

= 


k=i 


k=i 



n\ = y/lTln {l + 0{l/n)) 
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Answer 2.6 Differentiating, we obtain 

f , yt-i _ l-(« + l)x” + «x”+i 

^ “ (1-X)2 

;c(l-(n + l);c” + n;c"+^) 


jt=0 


so 


= 

k=0 


{l-xf 

Taking x = 2, we find that the right-hand side is 0(n2"), and taking x = 1/2 and 
letting n tend to infinity, the right-hand side tends to the value 2. 


Answer 2.7 We have 

n n 

^ k log k ^ log n ^ k = 0{n^ log n) 
k=i k=\ 

We also have 

n n n 

^klogk^ ^ klog k ^ log(n/2) ^ k = Q.{n^ log n) 

k—l k—njl k—njl 

Henee the sum is &{n^ log n). 

Answer 2.8 We have r(m,n) = 0(/nn 

Answer 2.9 To simplify head-foldr (-H-) [ ] we have to find a funetion opi so that 
head (x5 -H-y^) = op ^xs {head ys) 

It is easy to see what the definition should be: 

opj X5 y = if null xs then y else head xs 
That gives 

head ■ concati = foldr opi _L 
sinee head [] = T. 

To simplify head -foldl (-H-) [] using the fusion law of foldl we have to find a 
funetion op 2 so that 

head (x5 -H-y^) = op 2 {head xs) ys 

But, unless xs is known to be nonempty, no sueh op 2 exists because we would have 
to have op 2 -L y^ = head ys and op 2 xys = x. 

Answer 2.10 We have 

r(o) = 0 (i) 

T{n) =n {R{n) -|- r(n — 1) -|- 0((n — 1)!)) -|-0((n -|- 1)!) 
where R{n) is the time needed for removing an element from a list of length n. 
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The first term is the time needed to compute all evaluations of subperms. For each 
evaluation we remove an element, costing R{n) steps, compute the permutations of 
the resulting list, and then cons the element on the (n — 1)! results. The final term 
accounts for the concatenations. Since nR{n) = 0{{n + 1)!) we have 

T{n) = nT{n — \ ) + @{{n + \ )\ ) 

and, as we have seen, this leads to T{n) = 0((n + 2)!), a factor of n worse than the 
total length of the output. 

Answer 2.11 Yes. Applied to xs, all three methods return xs as the first permutation 
(in the case of perms j, it is required that the elements of xs can be compared with 
equality). 

Answer 2,12 Yes. Here is the definition: 
inits = help id 

where fieZp/[] =/[]:[] 

helpf {x'.xs) =/ [] '.help (f ■ {x'.)) xs 

It now takes linear time to compute length ■ inits. 

Answer 2.13 Since exactly half the digits are greater than or equal to 5, the algo¬ 
rithm inspects just one digit half of the time, two digits a quarter of the time, and so 
on. Therefore in the average case digits are inspected, which is 2 as we 

saw in Exercise 2.6. The average case is therefore only twice as bad as the best case. 

Answer 2.14 The definition is 

tailsl = takeWhile {not ■ null) ■ iterate tail 

Answer 2,15 If the array has size 1 to begin with, the add operations take, in order, 
times proportional to 1,2,1,4,1,1,1,8,.... The total cost of n add operations is 
therefore 

n+^2^-1 ^n + 2^+' 
k=i 

where 2^ ^n< 2^+^ Hence the amortised cost of an add operation is 0(1) steps. 
A similar situation occurs in Haskell. Periodically memory becomes full, and 
computation is suspended while a garbage collection takes place. Thus a cons 
operation is not actually guaranteed to take 0(1) steps in all cases. 



Chapter 3 


Useful data structures 


Most of the algorithms in this book can be implemented with acceptable efficiency 
using just common-or-garden lists. One or two others require more specialised data 
structures, such as binary search trees, heaps, and queues of various kinds. The 
general philosophy of this book is not to consider data structures in isolation from 
the algorithms that depend on them, so we postpone discussion of such structures to 
the appropriate time and place. However, there is a small group of interrelated data 
structures that we will introduce now. They are symmetric lists, random-access lists, 
and arrays. Each is designed in its own way to overcome an obvious deficiency in 
the running times of some of the basic operations on standard lists. 


3.1 Symmetric lists 

As we have seen, some operations on lists are lopsided with regard to efficiency: 
while adding an element to the front of a list takes constant time, adding it to the rear 
takes linear time. In what follows these two functions will be called cons and snoc, 
the former is defined by cons x xs = x:xs, and the latter by snoc x xs = xs[x]. 
Similarly, while head and tail take constant time, last and init take linear time. The 
data type known as symmetric lists overcomes this one-sidedness, guaranteeing 
amortised constant time for all six operations. The basic idea is quite simple: break 
the list into two and reverse the second half. In that way, a snoc can be implemented 
as a cons on the second half, last as a head, and so on. The problem occurs with init 
and tail when one attempts to remove an element; in some cases the list has first to 
be reorganised into two new halves. 

Here are the details. A symmetric list is introduced as a pair of lists 

type SymList a = ([a], [a]) 

with the understanding that the symmetric list (x5,y5) represents the standard list 
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xs -H- reverse ys. That means we ean eonvert baek from symmetrie lists to standard 
lists by 

fromSL:: SymList a —)■ [a] 
fromSL (x5,y5) =xs-{\-reverse ys 

A function such as fromSL which converts a representation back into the structure 
it is designed to represent is called an abstraction function. Using an abstraction 
function we can capture the required relationship between the implementation of 
an operation on the representing type and its ‘abstract’ definition with a simple 
equation. 

There is another aspect to the representation, the clever part that makes everything 
fit together. The two invariants 

null xs null ys V single ys 
null ys null xs V single xs 

are maintained on symmetric lists {xs,ys). Here, single is the test for a singleton list. 
In words, if one or other component is the empty list, then the other component has 
to be either the empty list or a singleton list. The operations on symmetric lists both 
exploit this representation invariant as well as maintain it. 

Apart from fromSL there are six other operations we are going to implement on 
symmetric lists; we will call them 

consSL, snocSL, headSL, lastSL, tailSL, initSL 

The implementations are designed to satisfy the six equations 

cons X -fromSL = fromSL ■ consSL x 
snoc X -fromSL = fromSL - snocSL x 
tail-fromSL = fromSL - tailSL 

init-fromSL = fromSL - initSL 

head -fromSL = headSL 
last -fromSL = lastSL 

Here are the definitions of snocSL and lastSL: 

snocSL :: a —7- SymList a —)■ SymList a 

snocSL X (x5,y5) = if null xs then (y^, [x]) else {xs,x:ys) 

lastSL :: SymList a^ a 

lastSL {xs,ys) = if null ys then head xs else head ys 

Both of these definitions make use of, and maintain, the representation invariant. In 
the case of snocSL we cannot just add x to the front of ys, because that would break 
the invariant if xs happens to be the empty list and ys a singleton. But when xs is 
empty we can return {ys, [x]), because ys is either empty or a singleton, and 
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[ ] -H- reverse [ ] 4f [x] = [ ] +|- reverse [x] 

[ ] -H- reverse [y ] 4f [x] = [y] 4+ reverse [x] 

In the case of lastSL, if ys is the empty list, then xs is either empty or a singleton 
and, in the second case, the last element is the sole member of xs. We should really 
have defined lastSL to read 

lastSL (x5,y5) = \inullys 

then if null xs 

then error "lastSL of empty list" 
else head xs 
else head ys 

Otherwise we would get a confusing “head of empty list” error message when 
trying to obtain the last element of an empty symmetric list. However, we will keep 
the code simple by omitting error messages, using a simple _L instead. Once the 
definitions of these two functions are understood, there should be no difficulty with 
implementing the entirely dual functions consSL and headSL, so we will leave them 
as exercises. 

Two functions remain, and here things get interesting. The definition of tailSL is 
as follows: 

lailSL:: SymList a —)■ SymList a 
tailSL {xs.,ys) 

I null xs = if null ys then _L else nilSL 
I single xs = {reverse V5, us) 

I otherwise = {tailxs,ys) 
where {us, vs) = splitAt {length ys div 2) 

Let us look at the three cases. In the first case, when xs is an empty list, the 
representation invariant guarantees that ys is either the empty list or a singleton 
list. If the former, then tailSL should really give a suitable error message rather 
than simply returning _L. If ys is a singleton, then tailSL correctly returns the empty 
symmetric list. The next easy case is the third one, in which xs is a list of length at 
least two. Then we can simply drop the first element of xs without destroying the 
invariant. The most interesting case is the second one, when xs is a singleton list, so 
ys can be a list of any length whatsoever. Here we split ys into two equal halves us 
and vs, and then return the value {reverse V5, us). That’s correct because 

[ ] -H- reverse {us 4+ v^) = reverse V5 44- reverse us 

The implementation of initSL is entirely dual to that of tailSL and we will leave it 
as another exercise. The definition of nilSL is also left as an exercise. 

Each of the operations apart from tailSL and initSL takes constant time. Although 
tailSL and initSL can take linear time in the worst case, they both take amortised 
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constant time. For the proof we employ the size method of the previous chapter. Con¬ 
sider a sequence of n symmetric list operations producing a sequence xo,xi, ...,x„ 
of symmetric lists, where we suppose xq is the empty symmetric list ([],[]). Recall 
that we have to construct a cost function C, a size function S, and an amortised 
function A to satisfy 

C{xi) ^ 5(x;) -5(x,+i) +A{xi) (3.1) 

For the size function S, we choose 

^(x,) = abs {length xsi — length ys^) 

where x,- = {xsi^ys^) and abs is the function that returns the absolute value of a 
number: 

abs n = if n ^ 0 then n else — n 

For the amortised time we choose A(x,) = 2. As to the costs of the individual 
operations, we can charge a cost of 1 for each of the constant-time operations, 
headSL, lastSL, consSL, and snocSL. Neither of the first two change the symmetric 
list, so (3.1) is satisfied for headSL and lastSL. Nexf consider snocSL, which, applied 
fo a symmefric lisf wifh componenf lengfhs {m,n), produces a symmefric lisf wifh 
componenf lengfhs (n, 1) if m = 0 and (m, n -f 1) if m / 0. Thai means S increases or 
decreases by 1 and so (3.1) is again satisfied. The same argumenf holds for consSL. 
Finally, excepf in one case, bofh tailSL and initSL also increase or decrease S by af 
most 1. The exceptional case is when one of xs or ys is a singleton and the other 
has length k. In this case S has the value k—\ before the operation and at most 1 
afterwards. Since 1 — l-|-2we can therefore charge k units for the cost of 

the operation in this case, again satisfying (3.1). 

For a fully serviceable library of symmetric list operations, we should of course 
provide additional operations, such as nullSL and singleSL for testing whether a 
symmetric list is empty or a singleton, and lengthSL for computing the length of a 
symmetric list. These are left as exercises. 

We will illustrate the use of symmetric lists on just one example. Consider again 
the function inits from Chapter 2: 

inits:: [a] —)■ [[a]] 
inits [] =[[]] 

inits {x:xs) = [] :map (x:) {inits xs) 

As we have seen, computing length ■ inits takes quadratic time. Can we find some 
ofher way of defining inits so fhaf fhis time is reduced fo linear time? The definition 

inits = map reverse ■ reverse ■ tails ■ reverse 

achieves fhis aim buf is unsafisfacfory for anofher reason: whaf we really wanf is an 
online algorithm for inits, so that given an infinite list, inits returns an infinite list 
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of its finite prefixes. The above definifion is nof online beeause one eannof reverse 
an infinife lisf. A beffer definifion, and fhe one given in Data.List in all essential 
defails, is fo write 

inits = mapfromSL ■ scant (flip snocSL) nilSL 

If sfill fakes quadratic fime fo prinf all fhe prefixes, buf only linear lime lo compule 
length ■ inits (assuming of course lhal fhe inpul lisf is finite). There is anolher 
definition of inits for which length ■ inits lakes linear time, one lhal does nof use 
symmelric lisls; we will leave lhal as an exercise. 

Before leaving fhe subjecl of symmelric lisls, we should mention lhal Haskell 
provides an alternative melhod for providing efficienl lisl operations in Ihe library 
Data.Sequence. This library supports a number of operations on lisls, including 
Ihose above. Instead of using Ihe idea of representing a lisl as Iwo componenl lisls, 
Ihe library is based on 2-3 finger frees, a dala slruclure lhal we will nol discuss. 


3.2 Random-access lists 

Some algorilhms, Ihough nol loo many, rely on being able lo relrieve Ihe klh elemenl 
of a lisl for various k. Haskell provides a lisl-indexing operator (!!) for Ibis purpose, 
Ihough we will rename il a& fetch: 

fetch::Nat ^ [a] a 

fetch kxs = \^k==0 then headxs else/elc/i (k—l) (tailxs) 

Felching Ihe klh elemenl of a lisl lakes 0(k) steps. In Ihis section, and also in 
Ihe following one, we discuss Iwo melhods for making fetch more efficienl. In 
Ihe presenl section we describe a dala slruclure known as a random-access list. 
Wilh random-access lisls each of Ihe operations cons, head, tail, and fetch lakes 
logarilhmic time in Ihe lenglh of Ihe lisl, lhal is, 0(log n) steps for a lisl of lenglh n. 
While Ihe performance of Ihe firsl Ihree operations deteriorates, Ihe Iasi one is made 
more efficienl. You pays your money and you makes your choice, as Ihe saying goes. 
Anolher imporlanl consequence of Ihe represenlalion is lhal, also in logarilhmic 
time, we can update an elemenl al a specified position wilh a new elemenl, an 
operation lhal would lake linear time wilh slandard lisls. 

A random-access lisl is conslrucled oul of Iwo olher dala slruclures, Ihe firsl of 
which is a binary free: 

data Tree a = Leaf a \ Node (Tree a) (Tree a) 

A tree is either a leaf containing a value, or a node consisting of two subtrees. The 
size of a tree is the number of leaves in the tree: 

size (Leaf x) =1 

size (Node t\t 2 )= size ti + size t 2 
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Some operations on trees depend on knowing the size of a tree. Since we do not 
want to recompute size from scratch each time, we can install its value in the tree, 
redefining Tree to read: 

data Tree a = Leaf a \ Node Nat {Tree a) {Tree a) 

Provided size information is correctly installed each time we huild a tree, we can 
now define size as a selector function: 

size:: Tree a —)■ Nat 
size {Leaf x) =1 

size {Node n _ ) = n 

The function node, known as a smart constructor, constructs a Node ensuring that 
size information is correctly installed: 

node :: Tree a —)■ Tree a —)■ Tree a 
node ti t 2 = Node {size ti + size t 2 ) fi t 2 

A binary tree can have many shapes and arbitrary sizes, but we are only going to 
construct perfect binary trees in which all leaves have the same depth. For example, 

t = Node 4 {Node 2 {Leaf ' a') {Leaf ' b')) {Node 2 {Leaf ' c') {Leaf ' d')) 

is a perfect tree of size 4. All perfect trees have sizes of the form 2^ for some p^O. 
We will see in due course how this perfection is guaranteed. 

The second data structure is a sequence of perfect trees. But what we need is 
not just an arbitrary list of trees, but a sequence of a special kind. The sequence is 
designed to reflect the binary numerical representation described in the previous 
chapter. Consider, for example, the number 6, which in (reversed) binary notation is 
Oil with the least significant bit first. The idea is to represent a six-element list, say 
"abcdef", by a sequence 

[Zero, 

One {Node 2 {Leaf ' a') {Leaf ' b')), 

One {Node 4 {Node 2 {Leaf ' c') {Leaf ' d')) 

{Node 2 {Leaf ' e') {Leaf ' f ')))] 

Similarly, 5 is 101 in binary, and a five-element list, say "abode", is represented by 

[One {Leaf ' a'), 

Zero, 

One {Node 4 {Node 2 {Leaf ' b') {Leaf ' c ')) 

{Node 2 {Leaf ' d') {Leaf ' e ')))] 

An empty list can be represented by [ ]. We will not allow trailing zeros in random- 
access lists, so the representations are unique. 

Here, finally, is the definition of a random-access list: 
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data Digit a = Zero \ One {Tree a) 
type RAList a = [Digit a] 

The abstraction function/rom/?A converts random-access lists into standard lists: 

froniRA v. RAList a —)■ [a] 
froniRA = c one atMap from 

where /rom Zero = [ ] 

from {One t) =fromT t 

fromT :: Tree a —)■ [a] 

fromT {Leaf x) = [x] 

fromT {Node _ ti t 2 ) = fromT ti -A-fromT t 2 

It is possible to make/romT more efficient, but we leave that to the exercises. 

The point of a random-access list is that we can skip over chunks of the list when 
looking up an element at a specified location: 

fetchRA ::Nat —)■ RAList a 

fetchRA k {Zero: xs) = fetchRA k xs 
fetchRA k {One f.xs) = if k < size t 

then fetchT k t else fetchRA {k — size t) xs 

fetchT :: Nat —)■ Tree a^ a 
fetchT 0 {Leaf x) = x 
fetchT k {Node n ti tf) =l^k <m 

then fetchT k ti else fetchT {k — m) t 2 
where m = n div 2 

The function/etc/j/M skips over trees whose elements have positions that are too 
small, taking into account the number of elements it has skipped over. When a tree 
is found that does contain a value at the desired position, the function fetchT is 
invoked. Using the size information stored in a tree the required element can be 
found either by searching the left subtree or the right subtree at each step. Provided 
k is in the range 0 ^k<n when looking up the kth element in a list containing n 
elements, we have 

fetch k -fromRA = fetchRA k 

Furthermore,takes 0(log k) steps. To see this, suppose 2^ ^k< The 
computation of fetchRA skips over p elements in 0{p) steps, and then searches a 
perfect binary tree of size 2^ in a further 0{p) steps. A better definition of fetchRA 
would produce an “index too large” error message ifn^k, but we will leave that 
definition as an exercise. 

In addition to fetchRA wd fromRA, five other basic operations are supported by 
random-access lists: 



50 


Useful data structures 


nullRA :: RAList a —)■ Bool 

nilRA "RAList a 

consRA :: a —7- RAList a —)■ RAList a 

unconsRA "RAList a —)■ [a,RAList a) 

updateRA "Nat —)■ a —RAList a —)■ RAList a 

The function nullRA tests whether a list is empty, nilRA returns an empty list, and 
updateRA updates a random-access list at a specified location with a new value. 
Its definition is similar to that offetchRA and we will leave it as an exercise. The 
definition of consRA stems directly from that of the inc operation of the previous 
chapter: 

inc[] =[1] 

inc (0: bs) = l'.bs 
inc (1: bs) = 0: inc bs 

Here is the definition of consRA: 

consRA xxs = consT {Leaf x) xs 

consT ti [] = [One ti] 

consT t\ {Zero: xs) = One t\: xs 

consT t\ {One t 2 :xs) = Zero: consT {node ti t 2 ) xs 

The definition of unconsRA follows that of the dec operation, which decrements a 
binary counter: 

dec [ 1 ] = [ ] 

dec (1: ds) =0:ds 
dec (0: ds) = 1: dec ds 

Here is the definition of unconsRA: 

unconsRA xs = {x,ys) where {Leaf x,ys) = unconsT xs 

unconsT:: RAList a —)■ {Tree a,RAList a) 

unconsT {One t:xs) = if null xs then (i, []) else {t,Zero:xs) 

unconsT {Zero: xs) = {t \, One t 2 : ys) where {Node _ t\ t 2 ,ys) = unconsT xs 

The code is a little subtle. To illustrate the fact that unconsT xs always returns a leaf 
as first component when xs is a well-formed random-access list, it is instructive to 
play through the example 

[Zero,Zero^ One t] 

where t is the perfect tree of size 4 that flattens to "abed" from page 48. According 
to the second clause of unconsT, the result is 

{ti,One t 2 :ys) where {Node _ ti t 2 ,ys) = unconsT [Zero,One {tree "abed")] 
Again according to the second clause, the right-hand side returns 
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{t^,One t^-.zs) where {Node _ t/^,zs) = unconsT [One {tree "abed")] 

Finally, according to the first clause of unconsT, we have 

unconsT [One {tree "abed")] = {tree "abed", []) 

That gives t^ = tree " ab", f 4 = tree " ed" and = [ ]. Hence we have ti = Leaf ' a', 
t 2 = Leaf ' b', and ys = [ ], and so 

unconsT [Zero,Zero, One {tree "abed")] 

= {Leaf ' a', [One {Leaf 'b' ),One {tree "ed")]) 

as required. 

Given unconsRA, we can define headRA and tailRA quite simply (see the exer¬ 
cises). As we have seen in the previous chapter, a sequence of n cons operations, 
or n uncons operations, on an initially empty list takes 0{n) steps, so considered 
separately they take amortised constant time. But when they are mixed, the best we 
can say is that they each take 0(log n) steps. The lookup and update operations also 
take this time. In the following section we look at a data stracture in which a lookup 
operation takes constant time, though an update operation goes from logarithmic 
time to linear time. 


3.3 Arrays 

One of the main differences between functional and procedural algorithms is that 
the former rely on lists as the basic carrier of information while the latter rely on 
arrays. In functional algorithms input usually consists of a list of values, whereas 
in procedural algorithms input values are usually assumed to be presented as the 
elements of an array. For a procedural programmer array updates are destructive: 
once an array is updated by changing the value at a particular index, the old array is 
lost. In functional programming, data structures are persistent because any named 
structure may be referred to at some other point in the computation and therefore has 
to continue to exist. Consequently, any update operation, even at a single index, has 
to be implemented by making a new copy of the whole array. Because they cannot be 
changed but only copied, purely functional arrays are known as immutable arrays. It 
is possible to get round this problem and allow mutable structures by encapsulating 
the operations in a suitable monad, but we will not introduce monads in this book. 

Wholesale or monolithic updates, on the other hand, are fine. Changing all or 
some of the entries at one go involves copying the array only once. Haskell provides 
a number of such wholesale operations in the library Data Array. The purpose of 
this section is simply to describe the main functions in this library. 

The type Array i e consists of arrays with indices of type i and elements of type e. 
The basic operation for constructing arrays is a function 
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array (/, i) —)■ [(/, e)] —)■ Array i e 

The type class Ix restricts what can be an index; usually this is an integer or a charac¬ 
ter, types that can be converted into a contiguous range of values. The first argument 
to array is a pair of bounds, the lowest and highest indices in the array. The second 
argument is an association list of index-value pairs. Building an array through array 
takes linear time in the length of the association list and the size of the array. 

A simple variant of array is list Array, which takes just a list of elements: 

listArray ::Ixi^ (/,/) —)■ [e] —)■ Array i e 
listArray {I, r) xs = array {I, r) {zip [l..r]xs) 

Finally, there is another way of building arrays, called accumArray, whose type 
seems rather complicated: 

accumArray ::Ixi^ (c —)■ v —)■ c) —)■£—)■ (/, /) [(b v)] Array i e 

The arguments are: an ‘accumulating’ function for transforming array entries 
e and new values v into new entries; an initial entry for each index; a pair of 
bounds for the array; and an association list of index-value pairs. The result of 
accumArray/e (/, r) ivs is an array with bounds (Z, r) and initial entries e every¬ 
where, built by processing the association list ivs from left to right, combining old 
entries and values into new entries using the accumulating function/. The process 
takes linear time in the length of the association list, assuming that the accumulating 
function take constant time. In symbols we have 

accumArray/e (Z, r) ivs = 

array {l,r) [{j/oldl/e [v | (Z,v) ^ ivs,i ==j]) \j ^ [Z. .r]] 

Well, nearly. In the Data.Array definition there is an added restriction on ivs, namely 
that every index in ivs should lie in the specified range (Z, r). If fhis condifion is nol 
met, then the left-hand side returns an error while the right-hand side does not. 

For example, we have 

accumArray (-f) 0 (1,3) [(1,20), (2,30), (1,40), (2,50)] 

= array (1,3) [(1,60),(2,80),(3,0)1 
accumArray (flip (:)) [] (' A', ' C') [(' A', "Apple"), (' A', "Apricot")] 

= array ('A', 'C') [('A', ["Apricot","Apple"]),('B',[]),('C',[])] 

As just one useful application of accumArray, suppose we are given a list of n 
natural numbers, all in the range (0,m) for some m. We can sort this list in 0(m -|- n) 
steps in the following way: 

sort:: Nat —)■ [Nat] —)■ [Nat] 

sort mxs = concatMap copy {assocs a) 

where a = accumArray (-f) 0 (0,m) {zip xs {repeat 1)) 
copy {x, k) = replicate k x 
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The function assocs:: Array i e —t [{i,e)] returns the list of index-value pairs in 
index order. The function elems, which can he defined hy 

elems Array / e —t [e\ 

elems = map snd ■ assocs 

converts an array to a list of its elements in index order. Thus elems is the abstraction 
function for converting arrays hack into standard lists. 

The good news is that with arrays the lookup function, (!), takes constant time. 
For instance, 

assocsxa = [{i,xa ! i) \ i t— range {boundsxa)] 

takes time proportional to the size of the array. The Data.Array function bounds 
returns the hounds of an array, and range enumerates the values between the lower 
and upper bound. 

The bad news, as we have said, is that array updates take linear time in the size of 
the array. The update function is //, with type 

{//)'.'. Ixi^ Array / e —)■ [(/, e)] —)■ Array i e 

Thus the operation xa jj ies updates the array xa with the associations in ies. For 
example, 

foldl update {array (1, n) [ ]) {zip [I ..n]xs) 
where update xa {i,x) =xa j j [(/,x)] 

builds an array but takes &{n^) steps to do it, while the equivalent expression 
array (l,n) {zip [1. .n] xs) 
takes &{n) steps. 

In summary, indexing and wholesale operations are efficient for arrays, while 
individual updates are not. “We can remember it for you wholesale”, as Philip K. 
Dick entitled one of his short stories (see [1]). 


3.4 Chapter notes 

The idea of modelling a symmetric list, also known as a double-ended queue or 
deque, by a pair of lists has been thought of many times. It appears in Okasaki’s 
book [5] on functional data structures, where the idea is attributed to Gries [2] and 
Hood and Melville [3]. See also [4], which introduces the representation invariant 
used above. Random-access lists, also known as one-sided flexible arrays, are 
discussed in Chapter 9 of [5]. That chapter also presents some alternative number 
representations, including binary numbers constructed from Is and 2s rather than 
Os and Is. Using such a representation, one can implement headRA to run in 0(1) 
worst-case time. The monolithic array operations of Data.Array were proposed by 
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Philip Wadler in [6], although others had earlier suggested similar operations. The 
Haskell Platform provides a number of other libraries for handling arrays, including 
unboxed, mutable, and storable arrays. 
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Exercises 

Exercise 3.1 Write down all the ways "abed" can be represented as a symmetric 
list. Give examples to show how each of these representations can be generated. 

Exercise 3.2 Define the value nilSL that returns an empty symmetric list, and the 
two functions nullSL and singleSL for testing whether a symmetric list is empty or 
a singleton. Also, define lengthSL. 

Exercise 3.3 Define the functions consSL and headSL. 

Exercise 3.4 Define the function initSL. 

Exercise 3.5 Implement dropWhileSL so that 
dropWhile-frornSL =fromSL ■ dropWhileSL 

Exercise 3.6 Define initsSL with the type 

initsSL :: SymList a —)■ SymList {SymList a) 

Write down the equation that expresses the relationship between/romS'L, initsSL, 
and inits. 

Exercise 3.7 Give an online definition of inits that does not use symmetric lists for 
which length ■ inits takes linear time. 

Exercise 3.8 Estimate the running time offroniT when applied to a perfect tree of 
size 2P, where/romT was defined by 
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froniT :: Tree a —)■ [a] 
fromT {Leaf x) = [x] 

froniT {Node - t\ 12 ) = fromT ti -VrfromT t 2 
One way to reduce the running time is to introduce a function 
fromTs:: [Tree a] —)■ [a] 

and define fromT t = fromTs [t]. Give an efficient definition of fromTs. Variations 
of this particular optimisation for flattening a tree will he used a number of times in 
the rest of the book. 

Exercise 3.9 What change to the definition of fetchRA is needed to produce a 
suitable error message when the index is too large? 

Exercise 3.10 Give a definition of the function toRA:: [a] —)■ RAList a that converts 
a list into a random-access list. 

Exercise 3.11 Give a definition of updateRA. 

Exercise 3.12 Following on from the previous exercise, give a one-line definition 
of a function 

(//)::RAList a —)■ [{Nat,a)] —)■ RAList a 

so that xs // kxs is the result of carrying out a sequence of updates kxs on a random- 
access list xs. The updates should be applied from left to right. Hint: both flip and 
the standard Haskell function 

uncurry:: {a ^ b ^ c) ^ {a,b) ^ c 
uncurryf {x,y) =f xy 
will be useful. 

Exercise 3.13 Define headRA and taiiRA. 

Exercise 3.14 Suppose you want to define an array/a with bounds (0,n) whose 
ktb entry is k!, the factorial of k. Complete the definition 

fa = UstArray {0,n) ???? 

in two different ways, one using scani and one not. (Hint: for the second definition 
use the fact that fa \ i = i xfa ! (/ — 1).) 

Exercise 3.15 There is another function accum in Data.Array with the type 
accum :: /x / ^ (e —)■ V — )■ e) — )■ Array / e — )■ [ (/, v) ] — )■ Array i e 
This function takes an accumulating function, an array, and an association list. It 
computes new array entries by combining elements from the association list with 
the accumulating function. More precisely, 

{accum f a ivs) ! j =foidif {a ! j) [v | (/, v) ■(— ivs, i ==j] 

Define accumArray in terms of accum. 
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Answers 

Answer 3.1 There are three ways: 

("a","deb"),("ab","dc"), ("abc","d") 

We have, for example, 

("a", "deb") =foldl (flip snocSL) nilSL "abed" 

("abe","d") = foldr consSL nilSL "abed" 

{" aJo" ,''dc") = consSL 'a' (snocSL 'd' (foldr consSL nilSL "he" 

Answer 3.2 We have 

nilSL:: SymList a 
nilSL = ([]f]) 

nullSL :: SymList a —)■ Bool 
nullSL (x5,y5) = null xs A null ys 

singleSL :: SymList a —)■ Bool 

singleSL (xs,ys) = (null xs A single ys) V (null ys A single xs) 

lengthsL :: SymList a —)■ Nat 
lengthSL (x5,y5) = length xs + length ys 

Answer 3.3 We have 

consSL :: a —7- SymList a —)■ SymList a 

consSLX (x5,y5) = if then ([x],xs) else (x:xs,ys) 

headSL :: SymList a 

headSL (xs,ys) = if null xs then head ys else head xs 

Answer 3.4 We have 

initSL :: SymList a —)■ SymList a 
initSL (x5,y5) 

I null ys = if null xs then _L else nilSL 
I single ys = (us, reverse vs) 

I otherwise = (xs, tail ys) 
where (us, vs) = splitAt (length xs div 2) xs 

Answer 3.5 We have 

dropWhileSL p xs 

I nullSL xs = nilSL 
I p (headSL xs) = dropWhileSLp (tailSLxs) 

I otherwise = xs 
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Answer 3.6 We can define 

initsSL xs = if nullSL xs 

then snocSL xs nilSL 

else snocSLxs {initsSL (initSLxs)) 

The relationship is 

inits-frornSL = mapfromSL-frornSL ■ initsSL 

Answer 3.7 We have 

inits = map reverse ■ scanl (flip (:)) [ ] 

Answer 3.8 We have 

T{p)=2T{p-l) + &{2P-^) 

where the 0(2^^^) term accounts for the concatenation. That gives T{p) = &{p2P). 
The new definition is 

fromT t =fromTs [t] 
fromTs [ ] = [ ] 

fromTs {Leaf x'.ts) =x \ fromTs ts 
fromTs {Node -t\t 2 '- ts) = fromTs (ti: t 2 '■ ts) 

This definition has a running time of 0(2^) steps. Another method is to use an 
accumulating function. 

Answer 3.9 Add a clause 

fetchRA ^ [] = error "index too large" 

Answer 3.10 We have 

toRA :: [a] —)■ RAList a 
toRA =foldr consRA nilRA 

Answer 3.11 We have 

updateRA k x {Zero : xs) = Zero: updateRA kxxs 
updateRA k x {One t : xs) = if ^ < size t 

then One {updateT kxt):xs 

else One t : updateRA {k — size t) x xs 

updateT :: Nat —)■ a —)■ Tree a —)■ Tree a 
updateT 0 x {Leaf y) = Leaf x 

updateT k x {Node nt\t 2 ) = if k<m 

then Node n {updateT kxti) t 2 
else Node n t\ {updateT {k — m)x t 2 ) 
where m = n div 2 
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Answer 3.12 We have 

(//) v.RAList a —)■ [{Nat,a) \ —)■ RAList a 
(//) =foldl {flip {uncurry updateRA)) 

For example, 

froniRA {toRA [0. .3] // [(1,7), (2,3), (3,4), (2,8)]) = [0,7,8,4] 

The intermediate updates are 

[0,1,2,3], [0,7,2,3], [0,7,3,3], [0,7,3,4], [0,7,8,4] 

If there are m updates on a random-aeeess list of length n, the running time of // is 
0(/n log n) steps. 

Answer 3.13 The definitions are 

headRA xs =fst {unconsRA xs) 
tailRA xs = snd {unconsRA xs) 

Answer 3.14 We have 

fa = listArray (0,n) {scanl (x) 1 [\..n]) 

fa = listArray (0, n) (1: [/ x/a ! (/ — 1) | / <— [ 1.. 10] ]) 

The listArray eonstruetion is not strict in the array elements, so recursive definitions 
such as the one above are legitimate. 

Answer 3.15 We have 

accuniArray f e bnds ivs = accumf {array bnds [(/,e) | i t— range bnds]) ivs 
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Divide and conquer (from the Latin divide et impera, and more accurately translated 
as divide and rule) is the first algorithm design technique we will study in depth. 
Given a problem to solve, either solve it directly if its size is sufficiently small and 
it is easy to do so, or else divide it into one or more subproblems, solve each of 
these subproblems, and then combine the solutions to give a solution to the original 
problem. Such a strategy covers pretty much everything about problem solving 
in computer science, or mathematics, or life for that matter, but the feature that 
makes it into a simple and effective computational tool is that each subproblem is 
simply the original problem on an input of smaller size. Hence each subproblem is 
solved by the same strategy. A divide-and-conquer algorithm is therefore essentially 
recursive in nature. 

Phrased this way, every functional algorithm that depends on explicit recursion 
can be thought of as a divide-and-conquer algorithm. After all, one possible de¬ 
composition of a problem of size n > 0 is to divide it into a problem of size n—1 
and a problem of size 1. For instance, an algorithm expressed as afoldr has essen¬ 
tially this decomposition. But in a truly divide-and-conquer algorithm there are 
two other important aspects. One is that each subproblem should have a size that 
is some fraction of the input size, a fraction like njl or n/4. In many cases the 
subproblems will have equal size, or as close to equal size as possible. A problem 
of size n might therefore be divided into two subproblems each of size n/2, a very 
common form of decomposition that we will meet later on. There are also examples 
of divide-and-conquer algorithms in which the subproblems have different sizes, for 
example one of size n/5 and the other of size 7 x n/10. We will encounter such an 
example in Chapter 6. The second important aspect is that the subproblems should 
be independent of each other, so the work done in solving them is not duplicated. 
Problems in which the subproblems overlap and have many sub-subproblems in 
common can be tackled by the dynamic programming strategy, a topic we will take 
up in Part Five. 

Finally, because the subproblems are independent and can be solved concurrently 
as well as sequentially, divide-and-conquer algorithms are highly suited to exploiting 
parallelism. We will not pursue parallel programming in this book, but see Simon 
Marlow’s book Parallel and Concurrent Programming in Haskell (O’Reilly, 2013), 
for an excellent coverage of the topic. 
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Binary search is probably the simplest example of divide and conquer. A search 
problem is solved by dividing it into two subproblems, each of size approximately 
half the original. The distinguishing feature of binary search is that one of these 
subproblems is trivial. In this chapter we introduce binary search by looking at two 
examples that can profitably use it, and then go on to encapsulate binary search as a 
data structure, a binary search tree. 


4.1 A one-dimensional search problem 

In the first problem we are given a strictly increasing function / from natural 
numbers to natural numbers (so x < y =>/ x </ y for all x and y) together with a 
target number t. The object is to find x, if it exists, such that t =/(x). Since / is 
strictly increasing, there is at most one solution. Furthermore, x </(x-|- 1) if / is 
strictly increasing, so the search can be confined fo fhe inferval 0 ^ x ^ f (inclusive). 
Recalling that 'Nat is a Haskell type synonym for Int, we have 

search:: {Nat —)■ Nat) Nat —)• [Nat] 
search f t = [x\ x <— [0.. f ], f ==/ x] 

The result of search is either an empty list or a singleton list. The only assumption 
we have really used about / is that t =/(x) ^ 0 ^ x ^ f for all x and t. This method, 
which searches for a value incrementally in steps of one, is called linear search. 

There are better methods than linear search for solving our problem, and we 
are going to give two. In both methods the first step is to make the search interval 
explicit: 

searchf t = seek (0,f) where seek {a,b) = [x \ x ^ [a. .b],t ==/x] 

The next step is to find a better version of seek. a>b, then seek {a,b) = []. 
Otherwise, let m be any number in the range a^m^b. We then have 
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seek {a,b) = \x | x •(— [a.. m — 1 ], f ==f x] 4f 
[m I t ==/ m] -H- 
[x \ X[m+\ . .b],t ==f x] 

The key observation is that if t <f{m), then the last two lists are empty; if t =f{m), 
then we are done; and if t >f{m), then the first two lists are empty. Here we do use 
the fact that/ is increasing. Hence we can define 

search:: {Nat —)■ Nat) —)■ Nat [Nat] 

search f t = seek (0, t) 

vihereseek {a,b)\a>b = [] 

\t<fm = seek {a,m—\) 

I t ==/ tn = [m] 

I otherwise = seek {m + \,b) 
where m = choose {a,b) 

It remains to choose m. The obvious choice to balance the two subproblems is to 
take m= [{a + b)/2\, the middle of the interval. In other words, 

choose {a,b) = (a + b) div 2 

This is binary search. A search problem is divided into a single subproblem of about 
half the size. It is easy to appreciate that binary search takes logarithmic time in the 
size of the interval being searched, because the interval halves at each step. Thus 
search/1 takes 0(log t) steps. To be more precise we have to formulate and solve 
the associated recurrence relation, but we will leave that discussion until after we 
have dealt with the second method for solving our problem. 

There are a number of aspects of the above definition of search that make another 
solution worth exploration, not the least of which is the fact that the definition is 
incorrect! For example, suppose/(n) = 2”. Then evaluation of search/ 1024 returns 
[ ] instead of the correct answer [ 10]. Pause for a moment to see if you can spot the 
bug. 


What has gone wrong is not the definition of search but its type. The first step 
requires evaluation of the test 1024 < 2^^^, and 2^^^ is a huge number, well beyond 
tbe capabilities of limited-precision arithmetic. In fact, as an element of Nat, eval¬ 
uation of 2^^^ returns 0, causing the test to incorrectly return False. The situation 
can be remedied by changing Nat to Integer, but the numbers are still huge and the 
calculations can be very time-consuming. 

Tbe second, minor problem with search is that/ is evaluated twice at each step. 
That is easily solved with a suitable local definition, but the fact still remains that in 
the worst case there are three comparison tests at each step. Can we do better? 

Yes, and here is the idea: we first find integers a and b such that/(a) <t^/{b) 
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and then search only the interval [a + \ ..b]. If t ^/(O), then we can invent a 
fictitious value/(—1) = —oo and set {a,b) = (—1,0); otherwise we can find a and b 
hy looking af fhe values of/ for fhe numbers 1,2,4,8,... unfil a value p is found for 
which/(2^^^) < t ^/(2^). Such a value is guaranteed fo exist, hecause/ is strictly 
increasing. The function bound computes such an interval: 

bound:: {Nat —)■ Nat) —)■ Nat —)■ {Int,Nat) 
bound/1 = ift 0 then (—1,0) else {b div 2,b) 
where b = until done (x2)l 
done b = t b 

It takes p+\ evaluations to compute bound/ t when/(2^^^) < t ^/(2^). In the 
worst case, when/ = id, that gives f?(log n) evaluations, hut when/(n) = 2”, only 
C?(log (log n)) evaluations are required. 

Now, to search the interval [a+\ . .b] we need only to find the smallestx such 
that t ^/(x). Such a value is guaranteed to exist hecause t ^/{b). That gives 

search/1 = if/x == t then [x] else [ ] 
where x = smallest {bound/1) 

smallest {a,b) = head [x | x t— [a-f 1. .b],t ^/x] 

The definition of smallest uses linear search, hut, as we have seen above, a better 
method is to split the interval: if a+\<b then for any m in the range a < m < Z? we 
have 

smallest {a,b) = head ([x | x t— [a-f 1. .m],t ^/x] -H- 
[x I X t— [m+ 1. .b],t ^/x]) 

This time, if t ^/(m), then the first list is not empty; otherwise it is. Hence we can 
write 

search:: {Nat —)■ Nat) —)■ Nat [Nat] 
search/1 = if/ x == t then [x] else [ ] where 
X = smallest {bound/1)/ t 

where 

smallest {a,b)/ t\a + \ == b = b 

\t^/m = smallest {a, m)/1 

I otherwise = smallest {m,b)/1 
where m = {a+ b) div 2 

This is our second version of binary search. We have made smallest a separate 
top-level function because we will need it in the following section. Note that 
smallest {a,b)/1 is well defined even if there is no x in the range a<x^b such 
that t ^ / x; in this case the value returned is b. In this version of binary search 
there is only one comparison involving/ at each step, as compared with two in the 
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worst case of the previous version. Moreover, search works with limited-precision 
arithmetic. Note, finally, that/(a) is never evaluated during the algorithm, so the 
fictitious value/(—1) = —oo is never required. 

To time this version, let T(n) denote the number of evaluations of / in the 
computation of smallest {a,b)f t when interval {a,b) contains n numbers, so that 
n = b — a + l. The fast and loose way to define T{n) is to write 

r(2) = 0 

T{n) = T{n/1) + 1 

To solve this recurrence, we can unfold it to give 

T{n) = \ + T{n/2) = 2 + T{n/4) = 3 + r(n/8) = • • • = k + T{n/2’^) 

It follows that T{n) = k if n = 2*^+'. If n is not a power of two, so 2^ < n < 2^+\ 
then we can appeal to the assumption that T{n) is an increasing function of n to 
arrive at the estimate T{n) ^ [log n]. If / takes constant time, then binary search 
takes 0(log t) steps. 

Here is where we are playing fast and loose. For one thing, the subproblems do 
not both have size n/2. If n is odd, then both subproblems have size [(n + l)/2], 
while if n is even, then just one of the subproblems has this size. For another thing, 
the sizes of intervals are natural numbers and T{n) is defined only when n is a 
natural number, so T{n/2) is not well-defined. Finally, the assumption that when 
the problem size increases the complexity cannot decrease is not always valid - it 
depends on the algorithm. Neither of the first two issues usually matters, especially 
when we are after only asymptotic bounds, such as T{n) = 0(log n). But sometimes 
they do. This is certainly the case when we are after an exact number. For instance, 
the exact number of evaluations off in the worst case of smallest on an interval of 
size n is given by the recurrence T{n) = T\{n + l)/2] -|- 1 and T{2) = 0. The exact 
solution turns out to be T{n) = [log (n — 1)] for 2 ^ n (see Exercise 4.3). However, 
in the main we will ignore floors and ceilings in recurrences, and carry on with fast 
and loose reasoning. 

Here is another recurrence relation that we will mention now, one that will 
crop up frequently in the following chapter: T{n) = 2T{n/2) + &{n). To solve this 
recurrence we unfold it, replacing 0(n) by cn to avoid tripping up on multiple 0s. 
Then we obtain 

T{n) = cn + 2T{n/2) 

= cn + 2{cn/2 + 2T{n/A)) 

= 2cn -|-4r(n/4) 

= kcn + 2’^T{n/2'^) 

Supposing 2^^ ^ <n ^2^, so k = [log n] , we obtain 
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Figure 4.1 An example grid 


T{n)=cn\\og n\ -b0(2ri°g”l) 

and so T{n) = 0(n log n). Such a running time is sometimes called linearithmic, a 
portmanteau word that combines linear and logarithmic. We will meet other more 
difficult recurrence relations in the following section. 


4.2 A two-dimensional search problem 

The second problem is much more interesting. This time we are given a function/ 
from pairs of natural numbers to natural numbers with the property that/ is strictly 
increasing in each argument. Given t, we have to find all pairs (x,y) such that 
f{x,y) = t. Unlike the one-dimensional case, there can be many solutions. To get a 
feel for the problem, take a look at the grid in Figure 4.1. Positions on the grid are 
given by Cartesian coordinates (x,y), where x is the column number and y is the row 
number. The bottom-left element is at position (0,0) and the top-right element is at 
position (11,13). What systematic procedure would you use to find all the positions 
that contain the number 472? Pause for a moment to answer this question. 


Did you try to use binary search? After all, that is what the chapter is about. The 
difficulty is that it is not easy to see exactly how to program the search in this two- 
dimensional case. So we will start slowly and begin with the obvious generalisation 
of one-dimensional search to a two-dimensional (t-|- 1) x (t-l- 1) grid: 

searchf t= [(x,y) | x t- [0..f],y ^ [0. .t],t ==/(x,y)] 

This method, which takes 0(t^) steps, searches the grid upwards column by column, 
starting at the leftmost column. Also it takes no account of the fact that searching a 
column can be abandoned as soon as an (x,y) is found for which t <.f{x,y)- There 
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has to be a better way; we shall describe no fewer than four, including three versions 
that employ binary search. 

The first improvement is to start at the top-left rather than the bottom-left corner: 
searchf t= [(x,y) | .r [0..t],y<- [t,t-1. .0],t ==/(x,y)] 

As in binary search, a more general version is obtained by making the search interval 
explicit: 

searchin {a,b)f t = [{x,y) | x ■(— [a. .t],y — 1. .0],t ==/(x,y)] 

Thus search = searchin (0,t). Next, we examine the various cases that can arise. 
First, it follows at once from the definition of searchin that 

searchin (a, b)f t |a>tV^<0 = [] 

Now suppose the search interval is not empty and/(u,^) < t. In this case column a 
can be eliminated from further consideration m\c&f{a,b') ^f{a,b) for b' ^ b. That 
means 

searchin {a,b)f t \f{a,b) <t = searchin (a + l-,b)f t 

In the dual case, f{a,b) > t, row b can be eliminated since f{a',b) f{a,b) for 
a' ^ a. That means 

searchin {a,b)f t \f{a,b) >t = searchin {a,b — 1 )/1 

Finally, iff{a,b) = t, then both column a and row b can be eliminated since 
f{a,b') <f{a, b) ifb'<b and f{a',b) >f{a,b) if a' > a. It is only in this last case 
that we use the fact that/ is strictly increasing, rather than just weakly increasing, 
in both arguments. 

Putting the four cases together, and renaming {a,b) as (x,y), we arrive at 

searchf t = searchin (0, t) 

where searchin {x,y) | x > t V y < 0 = [ ] 

\z<t = searchin {x +I,y) 

\z==t = {x,y): searchin {x+\,y — i) 

\z>t = searchin {x,y — \) 

where z =f{x,y) 

This method is known as saddleback search. It is fairly easy to see it requires only 
0(t) evaluations off. More precisely, suppose there isapxq rectangle to search. In 
the best case, when the search proceeds along the diagonal of the rectangle, finding 
occurrences of t at each step, there are {p min q) evaluations of /. In the worst 
case, when the search proceeds along the edges of the rectangle, there aicp + q—1 
evaluations off. As just one example, with/(x,y) =x^ + 3^ and t = 20259, it takes 
20402 evaluations off to obtain the answer [(24,9)]. That is quite close to the best 
case. 

Saddleback search can be improved because starting with the comers (0, t) and 
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Figure 4.2 A divide-and-conquer decomposition 


(tjO) can be an overly pessimistic estimate of where the required values lie. Instead 
we can use binary search to obtain better starting intervals. Recall from the previous 
section that, provided t b, the value of smallest {a,b)f t is the smallest x in the 
range a <x ^b such that t x. Hence, if we define 

p = smallest (— l,t) (Ay./(0,y)) t 

q = smallest {—I,t) {Xx.f{x,0))t 

then we can start saddleback search with the corners {0,p) and (^,0). This version 
of saddleback search takes 0(log t) -|- &(p + q) steps. Since p and q may be sub¬ 
stantially less than t, we can end up with a search that takes ©(log t) steps. For 
example, again with/(x,y) = -|- 3^ and t = 20259, we have p = 10 and q = 143. 

It now takes a total of only 181 evaluations off (including those evaluations in the 
two binary searches) to compute the answer, a substantial saving over the previous 
version. 

A third way to search a grid is to head for a proper divide-and-conquer solution, 
looking at the middle element of the grid first. After all, that would be the obvious 
two-dimensional analogue of binary search. Suppose we have confined fhe search fo 
a reef angle wifh lop-lefl corner (xi,yi) and boflom-righl corner (x 2 ,y 2 )- Whaf if we 
hrsf inspected fhe value/(x,y) where x = [(xi -|-X 2 )/ 2 J and y = [(yi -|-y2)/2j ? If 
f{x,y) < t, we can fhrow away all elemenfs of fhe lower-lefl recfangle. A picfure of 
fhis sifuafion is given in Figure 4.2, in which/(x,y) =z<t and fhe shaded recfangles 
are fhose we need fo keep. Similarly, if/(x,y) > t fhe upper-righf recfangle can be 
discarded. And finally if/(x,y) = t, fhen bofh can be discarded. 

This sfrafegy will nof, of course, mainfain fhe properly lhal fhe search space is 
always a recfangle; instead we will have eilher Iwo recfangles or an L-shape. We 
can splil an L-shape info Iwo recfangles by making eilher a horizonlal cul (as in 
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the figure) or a vertical cut. We can then continue the search in both the smaller 
rectangles. Without writing down the algorithm, let us see if this approach yields a 
faster algorithm hy looking at the associated recurrence relation. Consider an mxn 
rectangle, and let T{m,n) denote the number of evaluations off required to search 
it in the worst case. If m = 0 or n = 0, there is nothing to search and T{m,n) = 0. If 
m = 1 or n = 1, then the problem reduces to one-dimensional binary search and we 
have 

T{\,n) =\+T{\,n/2) 

r(m,l) = l+r(m/ 2 ,l) 

Otherwise, when m^2 and n ^ 2, we can throw away a rectangle of size at least 
m/2 X n/2. If we make a horizontal cut, then we are left with two rectangles, one of 
size m/2 X n/2 and the other of size m/2 x n. Hence 

T(m,n) = 1 + T{m/2,n/2) +T{m/2,n) 

If we make a vertical cut, then we have 

T{m,n) = 1 + T{m/2,n/2) +T{m,n/2) 

In order to reach a base case quickly, it is better to make a horizontal cut if m ^ n, 
and a vertical cut if m > n. 

To solve these recurrences assume m and n are powers of two and define U by 
U{iJ) = T{2‘,2j). Supposing i and making a horizontal cut, we therefore have 

U{0J) =j 

U{i+fJ+f) = f + U{iJ) + U{iJ + f) 

It is not easy to solve this recurrence, but we can make an educated guess and 
assume that the solution is exponential in i. If we set U(ij) = 2‘f{iJ) — I for some 
function/, then we obtain 

/(Oj) =7 + 1 

2/(i + 1,7 + 1) =fiij) +/ {iJ + 1) 

The second equation suggests another educated guess, namely that / is a linear 
function of i and/. Setting/(/, 7 ) = ai + bj + c, we obtain 

bj + c =7 + 1 

2{a{i + \ ) + b {j + \) + c) = ai + bj + c -\- ai + b {j + \) + c 

These equations are satisfied by taking a = — ^ 2 . ^ = 1 and c = 1. Putting the pieces 
together, we arrive at the solution 

U{i,j)=V{j-i/2 + l)-l 
Setting i = log m and/ = log n, we therefore have 

T{m,n) = 2*°S'” (log n — (log m)/2 + l) — 1 ^m log {2nj^/m) 

If m ^ n we should make a vertical cut rather than a horizontal one; then we get an 
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Figure 4.3 A two-dimensional divide-and-conquer decomposition 


algorithm with at most n log {2 m/y/n) evaluations of/. In either case, if one of m 
or n is much smaller than the other we get a better algorithm than saddleback search. 
For example, again with/(x,y) =x^ + 3^ and t = 20259, this method needs only 
96 evaluations of / to compute the answer, about half the number of the previous 
version. 

But we can do better still. As before, suppose we have confined the search to a 
rectangle with top-left corner {x\,y\) and bottom-right corner (x 2 ,>’ 2 )- Assume that 
Ji — ^ X 2 —xi, so there are at least as many columns as rows. Suppose we carry 

out a binary search 

X = smallest (xi — 1,X2) (Ax./(x,r)) t 

along the middle row, r= [(ji + J2)/2J. Recall that x is the smallest x in the range 
xi ^ X ^ X 2 , if it exists, such that t ^f{x,r); otherwise x = X 2 . If t<f{x,r), then 
we need continue the search only on the two rectangles ((xi,yi), (x — I,r-f I)) 
and ((x,r — I), (x 2 ,y 2 ))- Figure 4.3 shows a picture of this case, where z =/(x,r). 
If /(x, r) = t, then we can cut out column x and continue the search on the two 
rectangles ((xi,>’i), (x — l,r-|- 1)) and {{x + l,r— 1), (x 2 ,y 2 ))- Finally, if/(x,r) > t, 
so every entry in the row r is greater that t, then we can continue the search on the 
single rectangle ((xi, r — 1), (x 2 ,y 2 ))- The reasoning is dual if there are more rows 
than columns. As a result, we can eliminate about half the elements of the array 
with a logarithmic number of probes. The algorithm incorporating this method is 
given in Figure 4.4. 

As to the analysis, again let r(m, n) denote the number of evaluations of/ required 
to search anmxn rectangle. Suppose m^n. In the best case, when each binary 
search on a row returns the leftmost or rightmost element, we have 

T{m,n) = logn + T{m/2,n) 




72 


Binary search 


search f t = from {0,p) {q,0) where 
p = smallest {Xy.f{0,y))t 

q = smallest (—1, f) {Xx.f(x, 0)) f 
from {xi,yi) {x2,y2) 

\x2<xi Vyi <y2 = [] 

\y\-y 2 ^X 2 -xi = row X 
I otherwise = col y 

where 

X = smallest (xi — 1,X2) {Xx.f(x,r)) t 
y = smallest (y2 - l,yi) (Xy.f {c,y)) t 
c = {x\ +X 2 ) div 2 
r = {y\+y 2 ) div 2 

rowx\z<t =from{x\,y\) {x 2 ,r+\) 

\z==t= {x,r):from {xi,yi) (x-l,r+l)-ti-from (x+l,r- 1) {x2,y2) 
\z>t =from (xi,yi) (x- l,r+l)4f/rom (x,r- 1) {x2,y2) 
where z=f {x,r) 

coly \z<t =from{c+\,yi) {x 2 ,y 2 ) 

\z==t={c,y) -.from {xi,yi) (c- l,y + 1)-H-from (c+l,y- 1) {x 2 ,y 2 ) 
\z>t =from{xuyi) (c-l,y) 4f/rom (c+l,y-1) {x 2 ,y 2 ) 
where z=f (c,y) 


Figure 4.4 The final program 


with solution T{m,n) = 0(logm x logn). In the worst case, when each binary 
search returns the middle element, we have 

T(m,n) = log n + 2 T{m/2,n/2) 

To solve this recurrence relation, again set U{iJ) = T{2\2>). Then we have 
U{iJ) = 'f 2' (j-k) = 0(2' (/•-/+ 1)) 

k=0 

Hence T{m,n) = 0(m log (1 +n/m)). Dually, if n < m, we obtain a running time 
of T{m,n) = &{n log (1 + m/n)). For our example function/(x,y) = + 3^ and 

t = 20259, the final program of Figure 4.4 needs only 72 evaluations off to compute 
the answer, about three-quarters of the previous best time. 

These bounds are asymptotically optimal. Any algorithm for searching anmxn 
rectangle has to perform at least 

Q.{m log (1 +n/m) +n log (1 +m/n)) 

evaluations of /. This lower bound shows that when m = n we cannot do better 
than Q.{m + n) comparisons. So saddleback search is the best possible method on a 
square grid. But if m <n, then m^n log (1 + m/n) since x ^ log (l+x)if0^x^ 1. 
Thus when m^nwe have the lower bound Q. {m log {n/m)), and when m>nwe 
have the lower bound Q. {n log {m/n)). 
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The proof of the lower hound depends on the decision tree associated with the 
prohlem. The role of decision trees in putting a lower hound on the running time of a 
problem will he explained at the end of the next section in the context of sorting. So, 
it is perhaps better to read that section first and then come back to what follows. But 
here is the idea. Suppose there are A(m, n) different possible answers to the problem. 
For example, A(l, 1) =2 because there are two possible outcomes, either an empty 
list or a singleton list; and A(2,2) = 6 because the possible outcomes are one empty 
list, four possible singleton lists, and one possible doubleton list. Each test of/(x,y) 
has three possible outcomes,/(x,y) < t,f{x,y) = t, and/(x,y) > t, so the height h of 
the ternary decision tree has to satisfy h ^ logj A{m,n). Provided we can estimate 
A{m,n), this gives us a lower bound on the number of tests that have to be performed. 

To estimate A{m,n), observe that each list of pairs (x,y) in the range 0 ^ x < n 
and 0 ^y<m with/(x,y) = z is in a one-to-one correspondence with a step-shaped 
path from the top-left comer of the mxn rectangle to the bottom-right comer, in 
which the value z appears at the inner corners of the steps. This step shape is not 
necessarily the one traced by the function search. The path from the top-left to 
bottom-right comer contains m down-moves and n right-moves in some order, so 
the number of such paths is (which is the same as ), so that is the value 
of A(m,n). 

Another way to calculate A(m, n) is to suppose there are k solutions. The required 
value can appear in k rows in exactly (™) ways, and for each way there are 
possible choices for the columns. Hence 


A{m,n) = ^ 


k=0 


m 


m + n 
n 


since the summation is an instance of Vandermonde’s convolution, see [7]. Taking 
logarithms, we obtain the lower bound 


log A(m,n) = Q.{m log (1 +n/m) +n log (1 +m/n)) 


which is the result given above. 


4.3 Binary search trees 

Binary search trees capture the essence of binary search as a data structure. The 
trees are based on the following type: 

data Tree a = Null \ Node {Tree a) a {Tree a) 

A tree either is the null tree or consists of a node, which has a left subtree, a node 
value (also called its label), and a right subtree. This kind of tree is different from 
the one used in the construction of random-access lists in the previous chapter, in 
that values are stored at nodes rather than leaves. In general, trees can be classified 
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according to the precise form of the branching structure, the location of information 
in the tree, the presence or otherwise of subsidiary information, and the relationship 
between the information stored in different parts of the tree. We will encounter other 
kinds of tree in subsequent chapters. 

The size of a tree is the number of labels it contains: 

size:: Tree a —)■ Nat 

size Null = 0 

size {Node Ixr) = 1+ size I + size r 

The values in a tree can be turned into a list by the function flatten: 

flatten :: Tree a ^ [a] 
flatten Null = [] 

flatten {Node Ixr) = flatten I +|- [x] flatten r 

The running time of this definition of flatten is not linear in the size of the tree, 
an issue we have encountered before in Exercise 3.8. The solution is to use an 
accumulating parameter (see the exercises). 

By definition, a tree is a binary search tree if flattening it returns a list of values 
in strictly increasing order. Thus the label of a binary search tree is greater than any 
label in its left subtree and smaller than any label in its right subtree. 

The definition of a binary search tree can be modified in various ways. For 
example, one can allow duplicate node labels, so that flattening the tree produces 
a list only in nondecreasing order. More useful in practice is to allow labels to be 
records of some kind, with each record containing a key field unique to that record. 
The tree is ordered by key, so flattening it produces a list of records in increasing 
order of key. Such trees can be used to search dictionaries, in which the keys are 
‘words’ of some kind, and the records contain information associated with a given 
word. 

Here is the counterpart of binary search in terms of records and key fields: 

search:: Ord k ^ {a^ k) ^ Tree a —)■ Maybe a 

search key k Null = Nothing 
search key k {Node Ixr) 

I key x<k = search key k r 
I key X == k = Just x 
I key x>k = search key k I 

The search returns Nothing if there is no record with the given key; otherwise 
it returns Just x, the (unique) record with the given key. The tree is searched by 
following either the left or the right subtree of a node depending on whether the key 
at the node is greater than or less than the given key. In the worst case, the search 
takes time proportional to the height of the tree, where 
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height :: Tree a —)■ Nat 
height Null = 0 

height {Node Ixr) = \+ max {height 1) {height r) 

Thus the search is guaranteed to take 0(log n) steps for a tree of size n only if its 
height is 0(log n). Later on we will see how to ensure that the height of a tree is 
logarithmic in its size. 

Although two trees of the same size need not have the same height, the two 
measures are not independent. The height h and size n of a tree satisfy the rela¬ 
tionship h^n< In particular, h ^ [log (n + 1)]. The proof of this fundamental 
relationship is hy structural induction, a proof we leave as an exercise. By definition 
a tree is balanced if the heights of the left and right subtrees of each node differ hy 
at most one. There are other definitions of what it means for a tree to he balanced, 
but we will stick to this one. Although a balanced tree of size n need not have the 
minimum possible height [log (n + 1)], its height is always reasonably small. More 
precisely, if t is a balanced tree of size n and height h, then we have 

h ^ 1.4404 log (n + 1) + 0(1) 

The proof of this result uses induction in rather an indirect way. Suppose H{n) is 
the maximum possible height of a balanced tree of size n. Our objective is to put an 
upper bound on H{n). We will do this by turning the problem around. Suppose S{h) 
is the minimum possible size of a balanced tree of height h. Taking a tree of size n 
and height H{n), we therefore have S{H{n)) ^ n. Hence we can put an upper bound 
on H{n) by putting a lower bound on S{n): if S{n) ^f{n), then n ^f{H{n)) and so 
f^Hn) ^H{n). 

Since Null is the only tree with height 0, it is clear that 5(0) = 0. Similarly, there 
is only one kind of tree with height 1, so 5(1) = 1. The smallest possible balanced 
tree with height h-\-2 has two balanced subtrees, one with height h + \ and the other 
with height h. Hence 

S{h + 2) = S{h+\)+S{h) + l 

It is at this point that induction comes in. A simple induction argument shows that 
S{h) =fib{h + 2) — 1, where fib is the Fibonacci function. To complete the proof we 
will need the following fact about the Fibonacci function, which can also be proved 
by induction. Let cp and i/r be the two roots of the equation — x — 1 =0, that is, 
9 = (1 + \/5)/2 and i/r = (1 — \/5)/2. Thenfib{n) = ( 9 ” — i/r'*)/\/5. Furthermore, 
since i/r” < 1, we obtain that fib {n) > {tp" — 1)/ y/5. Hence 

{(pf^('^)+^-\)/./5-\<fib{H{n)+2)-\=S{H{n))^n 
Taking logarithms, we obtain 

{H{n)+2) log 9 <log(n + 1) +0(1) 

Since log tp > 1/1.4404, the result now follows. 
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We now turn to the task of building a balanced binary search tree from a list of 
distinct values. One way to build a tree, though the result is not necessarily balanced, 
is to partition the list into two lists, one containing those elements smaller than some 
fixed element, and those elements which are not. That leads to 

mktreev. Ord a^[a]^ Tree a 
mktree [ ] = Null 

mktree (x: xs) = Node {mktree ys) x {mktree zs) 

where (yx,zx) = partition (<x) xs 
partition pxs = {filter p xs filter {not -p) xs) 

In the best case, when partitioning splits a list of length n into two lists of lengths 
njl, the running time T{n) satisfies T{n + \) =2T{n/2) + 0(n). This recurrence, a 
slight variant of one we have seen before, also has the solution T{n) = &{n log n). 
In the worst case, when partitioning splits a list of length n into a list of length 0 and 
a list of length n, the recurrence relation is T{n +1) = T{n) + &{n) with solution 
T{n) = &{n^). 

In order to construct an efficient version of mktree that guarantees a balanced 
tree, we need to maintain information about the heights of the subtrees of a node. 
The way to do that is to modify the type Tree to read 

data Tree a = Null \ Node Nat {Tree a) a {Tree a) 

The extra label in a node is the height of the tree. Thus, we have 

height Null = 0 

height {Node h _ ) = h 

We can build these augmented trees with the help of a smart constructor node, 
defined by 

node :: Tree a^ a^ Tree a —)■ Tree a 

node Ixr = Node hlxr where h=\+ max {height 1) {height r) 

We will meet another smart constructor in a moment, and yet a third later on. 

A balanced tree can be constructed by inserting values one by one into an initially 
empty tree: 

mktree:: Ord a ^ [a] —)• Tree a 
mktree =foldr insert Null 

The definition of insert starts off easily enough: 

insert x Null = node Null x Null 
insert x {Node hlyr) 

\x<y = balance {insert xl) y r 

I X == y = Node hlyr 

\y<x = balance Iy {insert x r) 
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The value x is discarded if it is already present in the tree; otherwise x is inserted 
into either the left subtree or the right subtree. But we cannot simply apply node to 
the result since the result may not be a balanced tree. That is where balance comes 
in. It is a second smart constructor, smarter than node in that it restores balance as 
well as installing height information. 

To implement balance we have to consider three cases. First of all, observe that 
a single insertion can increase the height of a tree by at most one. That means 
it is sufficient to implement balance under the assumption that both subtrees are 
balanced and that they differ in height by at most two. The easy case is when the 
two subtrees differ in height by at most one. Then we can implement balance by 
node. The other two cases are entirely symmetrical, so we shall consider just the 
case when the left subtree has height two more than the right subtree, that is, 

height I = height r + 2 

We have to inspect the subtrees of the left subtree, so let I have left subtree ll and 
right subtree rl. In the first case, suppose height rl ^ height ll. Because I is assumed 
to be balanced, all of the following four relationships hold: 

height r = height 1 — 2 = height // — 1 ^ height rl ^ height ll 
In this case we can implement balance with a right rotation: 

balance Ixr = rotr (node Ixr) 

rotr {Node _ {Node ^lly rl) xr) = node ll y {node rl x r) 

Here is a picture of a right rotation: 



To check this leads to a balanced tree, we reason as follows: 

abs {height ll — height {node rlxr)) 

= { definition of height } 

abs {height ll—l— height rl max height r) 

= { since height r ^ height rl (see above) } 

abs {height ll—\— height rl) 

^ { since height // — 1 ^ height rl ^ height ll (see above) } 

1 


Thus the tree on the right of the picture is indeed balanced. 
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In the second case we have height ll < height rl. But again I is assumed to he 
balanced, so 

height ll+ \ = height rl 

In this case we have to inspect the subtrees of rl, so let Irl and rrl be the left and 
right subtrees of rl. In this case all of the following relationships hold: 

height r = height 1 — 2 = height rl—\= height ll = height Irl max height rrl 

In this case we can implement balance with a left rotation followed by a right 
rotation: 

balance Ixr = rotr (node {roll 1) x r) 

roll {Node - lly {Node _ Irl z rrl )) = node {node ll y Irl) z rrl 
Here is the picture: 



To check this leads to a balanced tree, note that 

balance Ixr = rotr {node {node {node lly Irl) z rrl) x r) 

= node {node ll y Irl) z {node rrl x r) 

We can then argue as follows: 

abs {height {node ll y Irl) — height {node rrl x r)) 

= { definition of height } 

abs {height ll max height Irl — height rrl max height r) 

= { since height rrl ^ height r (see above) } 

abs {height ll max height Irl — height r) 

= { since height Irl ^ height ll (see above) } 

abs {height ll — height r) 

= { since height ll = height r (see above) } 

0 

The remaining case height r = height Z + 2 is treated in an entirely dual manner. To 
give the complete definition of balance we will need a function bias, defined by 

bias :: Tree a —)• Int 

bias {Node - Ixr) = height I — height r 
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Then the full definition of balance is given by 

balance :: Tree a^ a^ Tree a —)■ Tree a 
balance t\ x t2 

I abs {hi — /i2) ^ 1 = node t\ x t2 
I h\ == /i 2 + 2 = rotateR ti x t2 

I /i2 == h \+2 = rotateL ti x t2 

where h\ = height ti; /i2 = height t2 

rotateR fi v f2 = if 0 ^ bias ti then rotr {node t\ x t2) 

else rotr {node {rod t\) xt2) 
rotateL t\xt2 = if bias t 2^0 then rod {node tixt2) 

else rod {node t\ x {rotr t2)) 

The definition returns an error when applied to two trees whose heights differ by 
more than two, but it is easy enough to define a balancing function that works for 
two trees of any height. This function, which we will call gbalance, will be needed 
in the following section. To compute gbalance t\ x t2, suppose first that hi>h2 + 2. 
In this case the subtrees ri, r 2 ,... along the right spine of t\ can be traversed to find 
a subtree r = r^ satisfying 

0 ^ height r — height f2 ^ 1 

Such a tree is guaranteed to exist because the subtrees ri, r 2 ,... decrease in height 
by at least one and at most two at each step. Furthermore, if I is the left-sibling of r, 
then 


abs {height I — height {node rxt 2 )) ^2 

because ti is a balanced tree and abs {height I — height r) ^ 1. That means I and 
node rxt 2 can be combined with balance. Rebalancing can increase the height of a 
tree by at most one, so further rebalancing up the tree maintains the precondition on 
balance. The traversal is captured by balanceR, defined by 

balanceR :: Set a ^ a^ Set a —)■ Set a 

balanceR {Node _ Z y r) x f 2 = if height r ^ height t2+2 

then balance ly {balanceR r xt 2 ) 
else balance Iy {node rxt 2 ) 

The situation is dual when /z 2 > fii + 2 and is expressed by a function balanceL, 
whose definition is left as an exercise. It is clear that balanceR t\ x t 2 takes 0{h\ — Zi 2 ) 
steps, where hi and h 2 are the heights of ti and t 2 . Dually, balanceL ti x t 2 takes 
0 {h 2 — hi) steps. 

With that, the complete definition of gbalance is as follows: 
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gbalance :: Set Set a —)■ Set a 

gbalance ti x t 2 

I abs {h\ — /i 2 ) ^ 2 = balance ti x t 2 
I /ii > /i 2 + 2 = balanceR ti x t 2 

\hi+2<h2 = balanceL ti x t 2 

where h\ = height t\, /i 2 = height t 2 

Evaluation of balance certainly takes constant time, even though gbalance does not. 
That means each insertion takes logarithmic time in the size of the tree, and building 
the tree takes 0(n log n) steps. 

Can we build a binary search tree from a list of elements over an arbitrary ordered 
type in better than 0(n log n) steps? To answer this question, observe first that any 
information we can discover about the elements of the underlying type arises solely 
as a result of comparison tests of the form x^yorx==y between the elements. 
That means it is sufficient to put a lower bound to the number of such comparisons 
required in the construction of the tree. If we can show that, say, Q.{f{n)) comparison 
tests are needed in the worst case, then that is a lower bound on the total time to 
build the tree. The argument is not valid when we are building a tree of integers or 
words, since there may be cunning methods that avoid comparison tests altogether. 
Now suppose the computation of mktree on a list of length n can be achieved with 
B{n) comparison tests of the form x ^ y. Then, since we can sort a list of elements 
over an arbitrary ordered type by 

sort :: {Ord a) [a] ^ [a] 
sort = flatten ■ mktree 

and since flatten involves no comparisons whatsoever, it follows that sorting a list 
of elements can be achieved with B{n) comparisons. 

We now put a lower bound to B{n). Every algorithm based on binary comparisons 
can be associated with a certain tree, called a decision tree. The decision tree is a 
binary tree whose labels are binary comparisons of the form x ^ y. The left subtree 
is a decision tree for the case x ^ y and the right subtree is a decision tree for the case 
X > y. The execution of any sorting algorithm based on binary comparisons traces 
a path from the root of the decision tree to a leaf, each leaf being associated with 
a unique permutation that sorts the list. Each permutation of the input determines 
a sorted list, so there have to be at least n\ leaves for an input of length n because 
there are n ! possible permutations that can sort the list. Since a binary tree of height 
h has no more than 2^ leaves, the decision tree has to have some height h such that 
n\ ^2^. Taking logarithms, that means h ^ log (n!). To estimate the right-hand side, 
we can use Stirling’s approximation (see Answer 2.5) to arrive ath = Q.{n log n). 
But h estimates the total number of comparisons that may be needed to sort the list 
in the worst case, so we have our lower bound B{n) =Q.{n log n). So, the answer 
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to the original question is: No, we cannot build a binary search tree in better than 
&{n log n) steps. 


4.4 Dynamic sets 

Sets that can grow or shrink over time are called dynamic sets. Operations on such 
sets include a membership test, adding a value to a set, and deleting an element 
from a set. One may also want to take the union of two sets, or to split a set into two 
sets, one containing those elements at most some given value, and the other those 
elements greater than the value. As a special case of set union, one may also want 
to combine two sets when it is known that the elements of the first are all less than 
any element of the second. Thus, splitting a set and then combining the results gives 
back the original set. 

In this section we will show how to implement these operations when sets are 
represented by balanced binary search trees: 

type Set a = Tree a 

The membership test is a simple variant of the function search of the previous 
section: 

member :: Ord a^ a ^ Set a —)■ Bool 
member x Null = False 

member x {Node - Iy r) \ x <y = member x I 

I X == y = True 
\ x>y = memberx r 

Insertion is implemented by the function insert of the previous section. The function 
delete is more interesting: 

delete:: Ord a^ a ^ Set a —)■ Set a 
delete x Null = Null 

delete x {Node ^ly r)\x<y = balance {delete xl)y r 
I X == y = combine I r 
\x>y = balance Iy {deletex r) 

Deleting a single value from a tree can reduce its height by at most one, so the smart 
constructor balance can be used to restore balance. Recall that balance t\ x t 2 was 
defined only for fhe case fhaf fhe heighfs of t\ and t 2 differed by af mosf fwo. Thai 
leaves combine, which in effecl has lo concalenale fwo frees. 

We will give fwo definilions of combine, fhe second of which generalises fhe firsl. 
In fhe first definition, combine is defined only for fwo balanced frees lhal differ in 
heighl by al mosf one. This is certainly sufficienl for ifs use in delete. The easy case 
is when one of fhe trees is null; in such a case we can simply return the other tree. 
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When neither tree is null we have to find an appropriate label for the combined tree, 
and there are two sensible options: either take the leftmost label of the second tree, 
or the rightmost label of the first tree. We choose the former, defining deleteMin by 

deleteMin :: Ord a ^ Set a —)■ {a, Set a) 
deleteMin {Node _ Null xr) = {x, r) 

deleteMin {Node-lx r) = {y, balance t x r) where {y d) = deleteMin I 

The function deleteMin returns the minimum element of a nonempty set, together 
with the set that remains after deleting the minimum element. The function balance 
can then be invoked to ensure that this set is balanced (see Exercise 4.15). Now we 
can define combine by 

combine :: Ord a ^ Set a —)■ Set a —)■ Set a 
combine I Null = I 
combine Null r = r 

combine Ir = balance Ixt where {x, t) = deleteMin r 

The second definifion of combine is exacfly fhe same, excepf fhaf balance is replaced 
by the more general function gbalance of the previous section. Therefore combine 
can be used to combine any two sets as long as all tbe elements of tbe first set are 
less than any element of the second set. Combining two sets of sizes m and n takes 
O (log n + log m) steps. 

The final function we will implement is a function split with type 
split:: Ord a^ a^ Set a —)■ {Set a, Set a) 

The value of split x f is a pair of sets, the first containing those elements of t which 
are at most x, and the second those elements that are greater than x. Thus, combining 
the two sets gives back the original set. In symbols, 

split xxs = (y5, zs) ^ combine ys zs = xs 
Even more briefly, uncurry combine ■ split x = id. Eor example, consider the 1973 
two-volume edition of the Shorter Oxford English Dictionary. If soed is the set 
of all the words in the dictionary, then the contents of each volume is given by 
split "Markworthy" soed. 

To define split we have fo splif a free info pieces and fhen sew fhe pieces fogefher 
fo make the final pair of sets: 

split xt = sew {pieces x t) 

A piece consists of a tree minus one of its subtrees, so it consists of a label and 
either a left or a right subtree: 

data Piece a = LP {Set a) a \ RP a {Set a) 

A left piece LPI x is missing its right subtree, and a right piece RP x r is missing its 
left subtree. The function pieces is defined by 
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pieces \ \ Ord a ^ Set a —)■ [Piece a] 

piecesxt = addPiece t [] where 

addPiece Null ps = ps 

addPiece {Node ^ly r) ps\x<y = addPiece I {RP y r : ps) 

I X ^ y = addPiece r {LP I y : ps) 
For example, pieces 9 t, where t is the tree 



produees the list of three pieees 



in whieh the missing tree is indieated hy a dashed line. We ean sew a list of pieees 
together by 

sew:: [Piece a] —)• {Set a, Set a) 
sew =foldl step {Null, Null) 

where step {ti,t 2 ) {LP tx) = {gbalance txt\,t 2 ) 
step {t\,t 2 ) {RPxt) = {t\,gbalance t 2 X t) 

For example, sewing the three pieees above produees the two trees 



We claim that split x t takes 0{h) steps, where h = height t. Certainly, pieces x t 
takes this time, so we have to show that sew does too. If we define the height of a 
piece to be the height of the tree associated with the piece, then pieces x t produces 
a list of pieces whose heights hi,h 2 ,...,hic are strictly increasing and bounded above 
by h. For example, the pieces pictured above have heights 0,1,2. The total cost of 
sew is proportional to 
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{hi — 0) + (/i2 — /zi) +... + — /i/t-i) ^ h 

because each call of gbalance ti x t 2 takes time proportional to the difference 
between the heights of t\ and t 2 - Thus both piece and sew take logarithmic time in 
the size of the sets, so split does too. We will need combine and a variant of split in 
Chapter 14. 


4.5 Chapter notes 

According to Knuth [10], the first publication to describe binary search (for the 
special case n = 2^ — \) appeared in 1946. The first version that worked for all n was 
not published until 1960. Binary search is easy to get wrong; see Bentley’s book [4] 
for an interesting discussion of his experiences in getting professional programmers 
to implement binary search. Saddleback search was so named by David Cries, see 
[3, 6, 8], probably because the shape of the three-dimensional grid, with the smallest 
element at the bottom left, the largest at the top right, and two wings, is a bit like an 
equestrian saddle. 

The functions sew and pieces of the final section are closely related to a more 
general way of taking a tree apart and joining two trees, called the Zipper, see [9]. 

Balanced binary trees are also called AVL trees, after their inventors Adelson- 
Velskii and Landis [1]; see also [10]. There are many other balanced tree schemes; 
for example [5] describes red-black trees, and [2] describes a simple scheme based 
on two operations, skew and split. 
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Exercises 

Exercise 4.1 The rule of floors states that for integers n and real numbers r we 
have n ^ [r\ 4^ n ^ r. This useful rule will appear in a number of problems. Using 
just the rule of floors (no case analysis) prove that for integers a and b we have 
a<{a + b) div 2<bii and only if a + 1 < ^. 

The dual rule of ceilings states that for integers n and real numbers r we have 
[r] r ^n. Using this rule prove that if h is an integer such that n < 2^, then 

[log (n + 1)] ^h. This result was stated in Section 4.3. 

Exercise 4.2 Look again at the expression 

head ([v | x t— \a+l. .m\d ^f x]-{\-[x\x ^ [m+l. .b],t ^/x]) 

where a<m<b and/(fl) < t ^f{b). If nothing else is assumed about/, then we 
cannot assert that the first list is empty if/(m) < t. Nevertheless, the definition of 
smallest {a,b) returns some value. What is it? 

Exercise 4.3 Recall that the exact number T{n) of evaluations off required in the 
worst case of evaluating smallest {a,b)f t, where n = b — a + lis the number of 
integers in the interval, satisfies r(2) = 0 and 

r(n) = r([(n + i)/2i) + i 

for n>2. Use the rule of ceilings to show that T{n) = [log (n — 1)]. 

Exercise 4.4 Following on from the previous question, given that/(a) <t^f{b), 
show that any algorithm for computing smallest {a,b) f t requires [log (n — 1)] 
comparison tests of the form t ^/(x). 

Exercise 4.5 What are the positions of 472 in the grid of Figure 4.1? 

Exercise 4.6 With/(x,y) = x^ +y^, what is the result of saddleback search for 
t = 1729? Does the final algorithm return the same result? 

Exercise 4.7 To obtain a linear-time definition of flatten we can use an accumulat¬ 
ing parameter. There is more than one way of doing so, but a simple method is to 
introduce/atcaf defined by 

flatcat:: Tree a —)■ [a] —)■ [a] 
flatcat t xs = flatten t-fj-xs 

We have flatten t = flatcat t [ ], so it remains to produce a recursive definition of 
flatcat that does not use flatten or any -H- operations. Give details of the synthesis. 
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Exercise 4.8 Prove by structural induction that 

height size 

for all binary trees t. 

Exercise 4.9 The definition 

partition pxs = (filter p xs filter (not -p) xs) 

involves two traversals of its second argument. Give a definition of partition that 
makes only one traversal of the input. 

Exercise 4.10 Another way to build a binary tree for a list containing duplicates is 
to build a tree of type Tree [a] in which node labels are lists of equal values. Show 
how to build such a tree. 

Exercise 4.11 Consider the recurrences 

B(n+\) =2B(n/2) +@(n) 

W(n + l) = W(n)+&(n) 

for the best and worst cases for building a binary search tree by partitioning the 
input. Prove that B(n) = &(n log n) and W(n) = &(n^). 

Exercise 4.12 Show that log (n\) = Q.(n log n) without using Stirling’s approxima¬ 
tion. 

Exercise 4.13 For the (second) definition of combine we have 
flatten (combine ti t 2 ) = flatten t{ -Vrflatten t 2 
Anticipating the following chapter, give a definition of merge for which 
flatten (union t\ t 2 ) = merge (flatten tfi (flatten t 2 ) 

Exercise 4.14 One method of defining union is to flatten one of the trees and then 
insert the elements one by one into the other tree: 

union :: Ord a ^ Set a —)■ Set a —)■ Set a 
union ti t 2 =foldr insert ti (flatten t 2 ) 

Supposing the first tree has size m and the second tree has size n, how long does 
union take? 

Another method is to flatten both trees, merge the results to obtain a sorted list, 
and then to build a tree from the sorted list: 

union ti t 2 = build (merge (flatten tfi (flatten t 2 )) 
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We can build a tree from a sorted list in linear time if we bring in arrays. Here is the 
first line of the definition: 

buildxs =from (0,n) {listArray {0,n — l)xs) where n = lengthxs 

(recall from Chapter 3 that the expression listArray (0, length xs — \)xs converts 
a list of length n into an array whose indices run from 0 to n — 1). Construct a 
definition of from. How long does this method of defining union take? 

Exercise 4.15 Why is the use of balance justified in the definitions of deleteMin 
and combine'? 

Exercise 4.16 Give the definition of balanceL. 

Exercise 4.17 Suppose pairf {x,y) = (f xj y). Using pair give a one-line, linear¬ 
time definition of split x. 


Answers 


Answer 

4.1 Here is the proof: 


a < (a -h fi) div 2 < 


{ definition of div } 


a < [(a-|-fi)/2j <b 


{ arithmetic } 


a + f ^ [(a-|-fi)/2j <b 


{ rule of floors (twice) } 


a + \ ^ {a + b)/2<b 


{ arithmetic } 


a + l<b 


The rule of floors is used twice in the above proof, the second appeal being the 
equivalent form: [rj <x 4^ r <x. 

For the second part we have n<2^ 44>n + l 44' log {n+l) ^h, and the result 
follows by appeal to the rule of ceilings. 

Answer 4.2 We have smallest {a,b) =x^f{x) <t ^/(x-|- 1). 

Answer 4.3 The proof is by induction and for the induction step we have to show 

[log(n-1)1 = [log([(n-hl)/2] -1)1 -hi 

To do so, we can reason by indirect equality, showing that a = bhy showing a^k 
if and only ifb for all k. Using the rule of ceilings on the left-hand side gives 
[log (n — 1)] ^ k -hh n — 1 ^ 2*^. On the right-hand side we have 
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riog(r(«+i)/2i -1)1+1 

4^ { arithmetic } 

[■log([(?i + l)/2] -1)1 ^k-\ 

44^ { rule of ceilings } 

log([(n + l)/2] -1) ^^-1 
{ arithmetic } 

[(n+l)/2l -1 ^2^-1 
{ arithmetic } 
r(n+l)/2l ^2^-i + l 
{ rule of ceilings } 

(n + l)/2^ 2*^-1 + ! 

{ arithmetic } 
n - 1 ^ 2^ 

establishing the result. 

Answer 4.4 From the answer to Exercise 4.2, we have that smallest {a,b)f t can 
return any one oib — a answers, namely those of the form/(x) < t ^/(x+ 1) for 
some X in the range a ^x<b. A decision tree with internal nodes labelled with tests 
of the form t ^/(x) therefore has to have a height h satisfying b — a. Since 
b — a = n — \we have the lower bound h ^ [log (n — 1)]. 

Answer 4.5 The positions are (0,9), (5,6), (7,5), and (9,0). 

Answer 4.6 The four answers are (9,10), (10,9), (1,12), and (12,1). The only 
issue is the order in which these four values are produced. With saddleback search 
the answers are found in the order (12,1), (10,9), (9,10), (1,12), but they are listed 
in reverse order. With the final algorithm it depends on the order in which the 
subrectangles are searched. It turns out that the final algorithm produces the list 
[(9,10),(1,12),(10,9),(12,1)]. 

Answer 4.7 Here is the recursive case: 

flatcat {Node Ixr) xs 
= { specification of flatcat } 

flatten {Node Z x r) 4Tx5 
= { definition of flatten } 

flatten 1 4+ [x] -M-flatten r-{f-xs 
= { specification of flatcat } 

flatten 1 4+ [x] -fj-flatcat r xs 
= { specification of flatcat } 

flatcat I (x '.flatcat r x^) 
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Answer 4.8 There are two eases, depending on whether the tree is Null or a Node. 
The former case is easy. For the latter, we will prove only the second inequality, 
which can be written in the form size t ^ ^ — 1 since both size and height are 

integers. We reason as follows: 

size {node Ixy) 

= { definition of size } 

size / + 1 + size r 
^ { induction hypothesis } 

^height I _ 2 _j_ j _|_ 2^height r _j 

^ { arithmetic } 

2l-\-max (height /) (height r) ^ 

^ { definition of height } 

2^height t _ I 


Answer 4.9 We have partition P [] = ([], [])■ Setting (ys,zs) = filterp xs gives 

partition p {x: xs) 

= { definition of partition } 

(filterp (x'.xs) filter (not-p) (x'.xs)) 

= { definition of filter } 

ifpxthen {x'.ys.,zs) else {ys,x:zs) 

from which we obtain 

partition p xs =foldr op ([],[]) 

where op X {ys,zs) =ifpxthen {x:ys,zs) else (y5,x:z5) 

Answer 4.10 The best way of computing mktree is to partition the input into three 
lists: those elements less than some given element, those elements equal to it, and 
those elements greater than it: 

partitions:: Ord a ^ a —)■ [a] —)■ ([a], [a], [a]) 
partitions y =foldr op ([],[],[]) 

where op x (us, V5, \x<y = (x: us, vs, 

I X ==y = {us,x:vs,ws) 

\x>y = {us,vs,x:ws) 

Now we can define 

mktree V. Ord a ^ [a] —)■ Tree [a] 
mktree [ ] = Null 

mktree xs = Node (mktree us) V5 (mktree wi) 

where = partitions (headxs) xs 

This definition of partitions will be needed in Chapter 6. 
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Answer 4.11 By unfolding the recurrence for B we obtain 

B(n+ 1) = cn + 2B{n/2) 

= 2cn + 4B{{n/2-l)/2) 

^ 2cn + 4B{n/4) 

^ ... 

^ kcn + 2^B(n/2^) 

giving B{n) = 0{n log n). We also have B{n) = Q.{n log n). Similarly, 

W(n + 1) = cn + yV{n) 

= cn 4-c {n — \) + W{n — \) 

= cn4-c {n—l)-\ -he 

= cn{n + l)/2 

giving W{n) = &{n^). 

Answer 4,12 For even n we have 

n\ ^ n{n — \){n — 2) ■ ■ ■ {n/2) ^ {nl2Yl'^ 
so log (n!) ^ {n/2) log {n/2) Y {n/4) log n for n ^ 4. The case for odd n is similar. 
Answer 4.13 The definition is 

merge :: Ord a ^ [a] —)■ [a] —)• [a] 

merge [ ] =ys 

merge [ ] =xs 

merge {x : xs) (y: y^) | v < y = v: merge xs (y: y^) 

I X == y = X: merge xs ys 
I X > y = y: merge (x: x^) y^ 

The function merge merges two sorted lists, removing duplicates. Merging two lists 
of lengths m and n takes &{m + n) steps. We will return to merging in the following 
chapter. 

Answer 4.14 Inserting a new element into a balanced tree of size m takes 0(log m) 
steps and produces a tree of size m+1. Hence, if we do it for n elements, then the 
number of steps is 

log m + log(m + 1) H-hlog {m + n—l) 

which is & {{m + n) log {m + n)) steps. 

The definition of from is 

from {I, r) xa = 

if I == r then Null else node (from {I, m) xa) {xa ! m) (from (m + 1, r) xa) 
where m = (l + r) div 2 

This method of defining union fakes 0(m + n) sfeps. 
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Answer 4.15 In the case of deleteMin, if I and r are two trees whose height differ¬ 
ence is at most one and t is a tree whose height differs from that of I hy at most one, 
then the height difference of t and r is at most two, meeting the precondition on 
balance. The argument for combine is similar. 

Answer 4.16 We have 

balanceL :: Set a^ a^ Set a —)■ Set a 

balanceL t\ x {Node _ Z y r) = if height I ^ height ti + 2 

then balance {balanceL ti xl)y r 
else balance {node t\xl)y r 

Answer 4.17 We have split x = pair mktree ■ partition x) - flatten. 




Chapter 5 


Sorting 


If binary search is the simplest example of the divide-and-conquer strategy, then 
sorting is arguably the most representative. By sorting we mean putting the elements 
of a given list into nondecreasing order. In this chapter we consider two basic 
divide-and-conquer algorithms for sorting. Nothing is assumed about the elements 
of the input except that they can be compared under so the type of sort is 
sort :: Ord a ^ [a] —)• [a]. In both algorithms the problem is divided into two 
subproblems, each of about half the size of the original, which are then combined 
to give the final result. Together, the dividing and combining phases involve 0(n) 
comparisons on an input of length n, so the associated recurrence relation takes the 
form 

T{n)=lT{n/T)+@{n) 

which we have seen has the solution T{n) = &{n log n). As we have also seen at 
the end of the previous chapter, this is asymptotically the best bound for a sorting 
algorithm based on comparisons. 

We will also consider one other comparison-based algorithm, and two more that 
assume additional properties of the elements. All five algorithms have a common 
theme, which is that sorting can be viewed as a two-stage process in which one first 
builds a tree of some kind and then flattens it. In symbols, 

sort = flatten ■ mktree 

It seems a bit wasteful of space to erect a potentially large data structure and then 
demolish it, but under lazy evaluation the tree only exists in very small pieces at any 
one moment. In any case, it is usually easy to synthesise another definition of sort 
in which the tree no longer appears. Building a tree encapsulates the division stage 
of a divide-and-conquer algorithm, while flattening it captures the combining phase. 

There are many different ways of sorting and it is not straightforward to say which 
is best. When expressed functionally, some famous sorting algorithms have different 
characteristics from when expressed as imperative code. What is a good algorithm 
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in one setting may not be so good in the other. Naturally, we will eoneentrate only 
on good funetional sorting algorithms. 

A good sorting algorithm should aim for four qualities, whose combination is 
not always easy to achieve. First, it should he fast. Ideally, not only should it be 
asymptotically optimal in the number of comparisons it makes, but the constants 
involved in other operations should also be small. What would you prefer for sorting 
a list of small size n, an algorithm that takes steps or one that takes lOOOn log n 
steps? An algorithm with a blindingly fast performance on average, but a quadratic¬ 
time performance in the worst case, might be acceptable. But then again it might 
not. Second, the algorithm should be smooth, meaning that the more sorted the 
input is, the faster the algorithm performs. In real life, large amounts of data are 
unlikely to be in truly random order and a good algorithm should take advantage 
of this fact. Third, the algorithm should be stable. When sorting records by key 
values, records with equal keys should appear in the same order in the output as in 
the input. One can always convert an unstable sorting algorithm into a stable one 
by first recording the position of each element in the list, then sorting the input by 
key, keeping elements with equal keys in separate ‘buckets’. Then each bucket is 
sorted by the positions of the elements in the original list. Finally, the positions 
are discarded. Bucket sort is one of the algorithms we will describe later in the 
chapter. Finally, a sorting algorithm should be compact, meaning that it should be 
economical in its use of space as well as running time. This is much more difficult 
to achieve in a purely functional setting, especially a lazy one, and we will quietly 
ignore the problem of compactness in what follows. In summary, sorting algorithms, 
like cars, should be fast, smooth, stable, and compact. 


5.1 Quicksort 

Following on from the previous chapter, our first sorting algorithm arises as a result 
of flattening a binary search tree. Here is the relevant data type again: 

data Tree a = Null \ Node {Tree a) a {Tree a) 

The function mktree builds a tree: 

mktree:: Ord a ^ [a] ^ Tree a 
mktree [ ] = Null 

mktree {x: xs) = Node {mktree ys) x {mktree 

where (y5,z5) = partition (<x) xs 

The function partition (<x) splits a list into two lists, comprising those elements 
less than x and those not less than x: 
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partition :: (a —)■ Bool) —)■ [a] —)■ ([a], [a]) 
partitionp xs = foldr op {[],[]) xs 

where 0/7 X {ys,zs) = if;? x then {x:ys,zs) else {ys,x:zs) 

The function flatten flattens a tree: 

flatten V. Tree a —)■ [a] 
flatten Null = [] 

flatten {Node I xr) = flatten Z +t- [x] -ft- flatten r 
Now define 

qsort = flatten ■ mktree 

It is easy to eliminate the tree (see Exercise 5.2) and the result is one version of the 
famous algorithm known as Quicksort: 

qsort :: Ord a ^ [a] —)• [a] 
qsort[\ =[] 

qsort (x: xs) = qsort y5 +|- [x] 4f qsort zs 

where {ys,zs) = partition (<x) xs 

However, this version of qsort is not fast, not smooth, and not compact. But it is 
stable. Stability is a result of the fact that partition does not change the order of 
the elements in the input. It is not fast because in the worst case qsort requires 
&{n^) comparisons to sort a list of length n. One worst case arises when the input 
is already sorted, so qsort certainly isn’t smooth. The problem lies in the choice 
of the partitioning element (or pivot) x, a choice that determines how equal in size 
the two sublists produced by partition will be. Choosing the first element of the 
input as pivot can lead to two very unbalanced subproblems. Better is to choose 
a random element of the input; better still is to choose the median element. But 
finding the median takes time. We will consider two median-finding algorithms 
in the following chapter. It is the case that qsort is fast on average, requiring only 
0(n log n) steps with a small constant of proportionality. Finally, qsort as defined 
above can be very inefficient in its use of space, requiring 0(n^) units in the worst 
case. Space efficiency can be improved by tweaking the definition of qsort but we 
won’t go into details. 

Nevertheless, Quicksort is a decent sorting algorithm when implemented in 
an imperative setting. In such a setting. Quicksort can be formulated in terms of 
mutable arrays rather than lists, and the partitioning phase can be carried out in 
place, meaning that the input array can be used as working space for partitioning. 
No other space, apart from the stack needed to implement the recursion, is required. 
But the space-efficient version sacrifices stability. We will not dwell further on the 
merits and demerits of Quicksort, partly because the topic has been addressed in 
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[1], and partly because there are better functional algorithms for sorting. Additional 
aspects of Quicksort are taken up in the exercises. 

5.2 Mergesort 

Quicksort is a divide-and-conquer algorithm in which the hard work is done in the 
dividing phase - the combining phase is just concatenation. By contrast, the next 
sorting algorithm, Mergesort, performs more work in the combining phase and less 
in the partition phase. 

Again we begin with a tree. This time the tree is a slightly different species: 
data Tree a = Null \ Leaf a \ Node {Tree a) {Tree a) 

There are no constraints on the order of the elements in a tree and one can build a 
tree for lists of arbitrary type. The ordering on the elements comes into play when 
we flatten a tree: 

flatten :: Ord a Tree a —)■ [a] 
flatten Null = [] 

flatten {Leaf x) = [x] 

flatten {Node ti t 2 ) = merge (flatten t[) (flatten t 2 ) 

The function merge merges two sorted lists into one: 

merge :: Ord a ^ [a] —)■ [a] —)■ [a] 

merge [ ] y>s' =ys 

merge [ ] = xs 

merge {x : xs) (y: ys) | x ^ y = x: merge xs (y: y^) 

I otherwise = y: merge (x: x^) y^ 

Merging two lists of lengths m and n by merge requires at most m + n comparisons 
(see the exercises). The cost of flattening a tree depends on the number of leaves in 
each subtree. If a tree of size n has two subtrees each of size njl, then the number 
of comparisons T{n) required satisfies 

r(n) =2r(n/2)+n 

with solution T{n) = &{n log n). It follows that if we can build a tree for a list of 
length n in this time, and the tree has the size-balanced property that each node has 
two subtrees that differ in size by at most 1, then sorting a list of length n can be 
done with &{n log n) comparisons. 

Here is a divide-and-conquer algorithm to build a size-balanced tree: 

mktree:: [a] ^ Tree a 
mktree [ ] = Null 
mktree [x] = Leaf x 

mktree xs =Node {mktree ys) {mktree zs) where (y.s',z5) = halve xs 
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The function halve splits a list into two equal halves: 

halve xs = (take m xs, drop m xs) where m = length xs div 2 

This definition involves three traversals of the list, one to compute its length, and 
two more to perform the splitting. No human would split up a list this way. Instead, 
they would simply deal out the elements alternately into two piles: 

halve = foldr op ([],[]) where op x (y5,z5) = {zs,x:ys) 

This version of halve produces a different result, but the two sublists have the same 
sizes as before and each is a subsequence of the input. For example, 

halve [1..9] = ([2,4,6,8], [1,3,5,7,9]) 

Since the order of the elements in the tree is immaterial, this definition of halve is 
perfectly adequate. 

The running time of mktree satisfies essentially the same recurrence as flatten 
so it takes 0(n log n) steps to build a tree. Clearly the tree has the size-balanced 
property. Now if we define 

msort = flatten ■ mktree 

then we obtain another famous algorithm called Mergesort. It is easy to eliminate 
the tree, and the result is 

msort [ ] = [ ] 
msort [x] = [x] 

msort xs = merge {msort ys) {msort zs) 
where (y5,z5) = halve xs 

Unlike Quicksort, Mergesort has a 0(n log n) running time, so it is fast. It is stable 
with the first definition of halve, but not with the second definition. Flowever, 
Mergesort is not smooth, taking 0(n log n) steps even when the input is already 
sorted. 

Returning to mktree, it is possible to build a size-balanced tree in linear time. 
The method was covered in [1] as an example of the tupling technique, so we will 
just sketch the idea and state the result. The idea is to avoid repeated halving, by 
defining mkpair nxs = {mktree {take nxs), drop nxs). A direct recursive definition 
of mkpair can then be derived, leading to 

mktree xs =fst {mkpair {length xs) xs) 
mkpair 0 X5 = {Null,xs) 
mkpair I xs = {Leaf {head xs ), tail xs) 
mkpair nxs = {Node t\ t 2 ,zs) 
where {ti,ys) = mkpair mxs 

{t 2 ,zs) = mkpair {n — m) ys 
m =n div 2 
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The running time T{n) of mkpairn satisfies T{n) = 2T{n/2) +0(1), with solution 
T{n) = &{n). 

There is another way of building a tree in linear time. Although the result will 
not be a size-balanced tree, it will be good enough to ensure that the corresponding 
version of Mergesort still has optimal asymptotic time complexity. The idea is to 
switch from a divide-and-conquer scheme to a bottom-up scheme. First, the list 
of elements is converted into a list of leaves. Provided this list is not empty or a 
singleton list, it is then halved in size by combining adjacent pairs of trees into 
larger trees. The halving process is repeated until only one tree is left: 

mktree [ ] = Null 

mktreexs = unwrap (until single (pairWith Node) (map Leaf xs)) 

pairWithf [ ] = [ ] 

pairWithf [x\ = [x] 

pairWithf (x:y: xs) =/ x y: pairWithf xs 

The functions unwrap and single (and wrap, used below) were defined in Exer¬ 
cise 1.3. The running time T(n) of this version of mktree satisfies 

T{n) = T(n/2)+@(n) 

Thus T(n) = 0(n), the same as before. If we use this bottom-up version of mktree 
in the definition of msort, then we can synthesise another definition: 

msort [ ] = [ ] 

msort xs = unwrap (until single {pairWith merge) (map wrap x^)) 

This synthesis is more interesting than the previous one, and the details are provided 
in the exercises. This version of Mergesort, called Bottom-up Mergesort (and some¬ 
times Straight Mergesort), converts the input into a list of singleton lists and then 
repeatedly merges those lists in pairs until a single list is left. To time this version of 
msort assume that the length n of the input is a power of two. The first pass involves 
repeatedly merging two singleton lists, taking at most 2 x n/2 = n comparisons. 
The second pass involves repeatedly merging two lists of length two, taking at most 
4 X n/4 = n comparisons, and so on. That gives a total of at most kn comparisons, 
where n = 2^. For general n we have that Bottom-up Mergesort takes &(n log n) 
comparisons. 

Looking deeper into Bottom-up Mergesort, we can see that it is not essential to 
start with a list of singleton lists. Instead, we could split the input into nondecreasing 
runs and begin the merging process with that: 

msort [ ] = [ ] 

msort xs = unwrap (until single {pairWith merge) (runsxs)) 

The function runs splits a list into runs of nondecreasing values: 
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runs'.: Ord a ^ [a] —)• [[a]] 
runs =foldr op [] 

where 0/7 x[] = 

opx {{y: xs): xss) | x ^ 3 ^ = {x:y:xs): xss 

I otherwise = [x]: {y:xs) :xss 

The function runs processes the input from right to left. The next element is added 
to the front of the current run if possible; otherwise it begins a new run. This version 
of msort, called Smooth Mergesort (and sometimes Natural Mergesort), is smooth; 
in particular, if the input is already sorted, it needs only 0 (n) comparisons to return 
the input untouched. The Haskell Data.List function sort is similar, except that it 
is more cunning and splits the input into both ascending runs and descending runs, 
taking care to reverse the descending runs. Here is the definition: 

runs[x] =[[-^]] 

runs {x:y:xs) 

I X ^ y = upruns y (x:) xs 
I otherwise = dnruns y [x] xs 

uprunsxf [] = 1 / [x]] 
upruns xf {y: ys) 

|x^y = upruns y (f ■ (x:)) ys 

I otherwise =f [x]: runs (y'.ys) 

dnruns xxs[] =[x:xs] 
dnruns xxs {y: ys) 

I X > y = dnruns y (x: x^) y^ 

I otherwise = (x: xs): runs (y: y^) 

This time, runs processes the input from left to right. The second argument of both 
upruns and dnruns is an accumulating parameter, a list in the case of dnruns and a 
function in the case of upruns. The first argument of dnruns is the minimum value 
encountered so far, while the first argument of upruns is the maximum. In the case 
of dnruns, if the next value in the input is strictly smaller than the minimum, then 
the minimum is added to the front of the mn, and the new value becomes the current 
minimum. Thus, the runs produced by dnruns are in strictly increasing order. In 
the case of upruns, if the next value is no smaller than the maximum, then the 
maximum is added, in effect, to the back of the run, and the next value becomes the 
new maximum. Thus, the runs produced by upruns are in weakly increasing order. 
The asymmetry between dnruns and upruns, the first producing strictly increasing 
runs and the second only weakly increasing runs, has as a consequence that the 
resulting sorting algorithm is stable. We omit a formal proof, but simply give an 
example. Using a, b, and c to indicate relative order in the original list, we have 
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runs [6,4, 3q, 2(j,2/,, 1^, l/,,2(., lc,3fe] 

= [[2a,3a,4,6], [Iq,2;,], [ 1 ;,, 2c], [Ic,3/,]] 

Merging these lists in pairs, and then merging the results, produees the sorted list 

which is the required order for a stable sort. 

Four versions of Mergesort have been covered in this section, all of which have 
relatively short definitions. The final one is probably the best, being fast, smooth, 
and stable. As we said above, it is this definition of Mergesort that is provided 
in the Haskell library Data.List. Well, not quite. The Haskell definition is more 
general in that the comparison test is provided as an extra argument. To motivate the 
definition, consider how you would sort a list into descending rather than ascending 
order. All the algorithms above have tacitly assumed that sorting means sorting 
into ascending order. One can of course define sortDown = reverse ■ sort, but this 
definition involves another traversal and that adds overhead. There are other kinds 
of comparison between elements that one can think of. For instance, we might 
conceivably want to sort a list of numbers so that all the even ones come first. To 
achieve this generality. Data.List provides two more functions 

sortBy :: {a ^ a ^ Ordering) —)■ [a] —)■ [a] 
sortOn:: Ord ^ (a —)■ fi) —)■ [a] —)■ [a] 

The type Ordering is defined in the Standard Prelude: 

data Ordering = LT \ EQ \ GT 

The function compare is a method in the type class Ord and has the type 
compare:: Ord a ^ a ^ a ^ Ordering 
For example, compare 3 4 = LT. As two examples, 

sortBy compare [3,4] =[1,3,4] 

sortBy (flip compare) [3,1,4] = [4,3,1] 

The function sortBy cmp sorts according to the weird even-odd ordering described 
above, where cmp is defined by 

cmp xy = compare (odd x,x) (odd y,y) 

This definition exploits the fact that False < True in Haskell. For example 
sortBy cmp [1.. 10] = [2,4,6,8,10,1,3,5,7,9] 

The variant sortOn sorts according to the values of a given function. For example, 
sortOnfst sorts a list of pairs in order of first components. Exercise 5.12 asks you to 
define this variant. We will need sortBy and sortOn later on in the book, but for the 
while we continue with the assumption that sorting means sorting into ascending 
order. 
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Our next sorting algorithm, Heapsort, involves a different kind of tree from that in 
the previous section, one that is identical to the type of binary search trees except 
that node labels are placed before the two subtrees: 

data Tree a = Null \ Node a (Tree a) {Tree a) 

We can flatten such a tree by 

flatten V. Ord a ^ Tree a —)■ [a] 
flatten Null = [] 

flatten {Node xuv) =x: merge {flatten u) (flatten v) 

By definition a tree is a heap if flattening it produces a list in nondecreasing order. 
Thus a heap is a tree in which the value at a node is no larger than the values in 
either of its subtrees. If the tree is size-balanced, then flattening it takes 0 {n log n) 
steps for a tree of size n. 

It is easy enough to build a size-balanced heap in linearithmic time (i.e. 0{n log n) 
steps for a list of length n) - see Exercise 5.13. However, as we will now show, it is 
possible to build such a heap in linear time. The idea is to compute mkheap as the 
composition of two other functions: mktree, which builds a size-balanced tree, and 
heapify, which reorganises the labels to ensure the heap condition. We will leave 
the linear-time implementation of mktree as another exercise and concentrate on 
heapify. The definition starts out easily enough: 

heapify :: Ord a ^ Tree a —)■ Tree a 
heapify Null = Null 

heapify {Node xuv) = siftDown x {heapify u) {heapify v) 

It remains to define siftDown. This function is another smart constructor, taking a 
value and two heaps and building a heap by sifting the value downwards until the 
heap property is restored: 

siftDown:: Ord a^ a^ Tree a —)■ Tree a —)■ Tree a 
siftDown X Null Null = Node x Null Null 
siftDown X {Node y uv) Null 

I X ^ y = Node x {Node y uv) Null 

I otherwise = Node y {siftDown xuv) Null 
siftDown X Null {Node y uv) 

I X ^ y = Node x Null {Node y uv) 

I otherwise = Node y Null {siftDown xuv) 
siftDown X {Node y ul ur) {Node z vl vr) 

I X ^ min y z = Node x {Node y ul ur) {Node z vl vr) 

I y ^ min xz = Node y {siftDown x ul ur) {Node z vl vr) 

I z ^ min X y = Node z {Node y ul ur) {siftDown x vl vr) 
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Note that heapify does not change the structure of a heap, so heapify returns a 
size-halanced tree if given one. To show that heapify takes &{n) steps when applied 
to a size-halanced tree t of size n, observe that siftDown is applied to every subtree 
of t, including t itself, and in the worst case can take time proportional to the height 
of the subtree. Suppose t has height h. Then t has one subtree of height h, at most 
two subtrees of height h—\, at most four subtrees of height h — 2, and so on. Hence 
the total running time of heapify is proportional to at most 

h h 

/i + 2(/i-l)+4(/i-2) + ---+2'’-i = £2'’^*)t = 2^ £ V2^<2^+' 

jt=i k=i 

Finally, a size-balanced tree of size n has height 0(log n) (in fact it has the minimum 
possible height [log [n + lf] - see the exercises) and so T{n) = 0(n). 

Here, finally, is the definition of Heapsort: 

hsort:: Ord a ^ [a] —)■ [a] 
hsort = flatten ■ heapify ■ mktree 

A tree is built in 0(n) steps, heapified in 0(n) steps, and flattened in 0(n log n) 
steps. Thus hsort takes 0(n log n) steps. Unlike Quicksort or Mergesort, the tree 
cannot be eliminated by fusing the component functions, so Heapsort necessarily 
involves building a tree. In an imperative setting, when the input is given in an array, 
the tree can be stored in the array by juggling with array indices, so Heapsort can be 
made in-place. Heapsort is fast, but not smooth, stable, or compact. On random data 
it turns out to be somewhat slower than the best version of Mergesort, so we still 
have a champion. On the other hand, heaps are useful for other purposes, including 
the implementation of priority queues, a topic we will take up in Chapter 8. 


5.4 Bucketsort and Radixsort 

We turn now to two completely different kinds of sorting algorithm. Neither is based 
on comparisons between elements of some arbitrary type, so sorting no longer has 
the type sort" Ord a ^ [a] —)• [a]. Instead these two algorithms exploit the structure 
of the elements to be sorted. To set the scene, consider sorting a list of words, where 
a word is a list of alphabetic characters. That means sorting the words into lexical 
order. We can use Mergesort for this purpose: 

type Word = [Char] 
sortWords:: [WorJ] ^ [WorJ] 
sortWords = msort 

As we have seen, sorting a list of n words this way involves 0{n log n) comparisons 
between words. However, comparing two words does not take constant time. If there 
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are at most k letters in each word, then comparing them will take 0(k) character 
comparisons in the worst case. That means the true cost of computing sortWords 
is 0{kn log n) steps. The two algorithms in this section reduce the cost to 0{kn) 
steps. 

The way we will view the problem is to think of a word as containing of 
information, the fields being the first character of the word, the second character, 
and so on. Similarly, an integer can be defined in terms of the fields of its decimal 
digits. These fields can be extracted by providing a list of total functions, which 
we will call discriminators, each of which extracts one possible field. For instance 
with decimals of length k, there will be k discriminators, one for each decimal digit. 
Given a list of discriminators, two elements x and y are lexically ordered if the 
following test returns True: 

ordered:: Ord )-a—)-a—)■ Bool 

ordered []xy = True 

ordered {d: ds) xy = {dx<dy)\/ {{dx == dy) A ordered ds x y) 

In this formulation of the problem, the fields themselves have to be elements from 
some ordered type. 

The obvious way to sort a list of words is to divide them into piles, or buck¬ 
ets, according to their first letter. Each bucket is then sorted in the same way, 
but on the second letter, and so on. At the end of this process there will be lots 
of little buckets, each containing a single word. These buckets then have to be 
combined in the right order to give the final sorted list. The simplest way to im¬ 
plement this idea is in terms of a tree, and the kind of tree we will need is as 
follows: 

data Tree a = Leaf a \ Node [Tree a\ 

This kind of tree is sometimes called a rose tree. A rose tree therefore is either a 
leaf containing a value, or a node that can have an arbitrary list of subtrees. We can 
build a rose tree by 

mktree:: {Bounded b,Enum b, Ord fi) ^ [a —)■ fi] ^ [a] ^ Tree [a] 

mktree [ ] = Leaf xs 

mktree {d: ds) xs = Node {map {mktree ds) {ptn d xs)) 

A rose tree is built by partitioning the list into buckets according to the first field. 
Each bucket is then converted into a tree by applying the algorithm recursively 
to the remaining fields. Later on we will modify this definition in the interests of 
efficiency. The reason for the type-class constraints on mktree will become apparent 
in a moment. 

The function ptn partitions a list into buckets according to the field extracted by 
the discriminator: 
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ptnw {Bounded b,Enum b, Ord b) ^ {a^b) ^ [a] —)■ [[a]] 
ptn dxs = [filter (Ax. d x == m) xs\m^ mg] 

where rng = [minBound.. maxBound] 

The definition of rng explains why the result type of a diseriminator has to be both 
bounded and enumerable. The definition of ptn is very ineffieient, and for two quite 
separate reasons. Firstly, rng may be a very long list. For example, sinee Haskell’s 
Char type represents all Unicode characters, the length of rng:: [Char] is over a 
million. Secondly, the result of ptn is computed by repeatedly traversing the input. 
If rng has length r and the input has length n, then ptn requires r xn evaluations of 
the discriminator. We will address this problem of efficiency later on. 

Having built a tree, we can now flatten it: 

flatten:: Tree [a] [a] 

flatten {Leaf xs) = xs 

flatten {Node ts) = concatMap flatten ts 

The resulting sorting algorithm is known as Bucketsort: 
bsort ds xs = flatten {mktree ds xs) 

Well, not quite. In Bucketsort as traditionally presented there are no trees. We should 
be familiar with this situation by now, and the next step is to eliminate the interme¬ 
diate tree. The base case bsort [ ] X 5 = X 5 is easy. For the induction step we reason 

bsort {d: ds) xs 
= { definition of bsort } 

flatten {mktree {d: ds) xs) 

= { definition of mktree } 

flatten {Node {map {mktree ds) {ptn d x^))) 

= { definition of flatten } 

concatMap (flatten ■ mktree ds) {ptn d xs) 

= { definition of bsort } 

concatMap {bsort ds) (ptn d xs) 

Hence we have shown 
bsort [ ] X5 = xs 

bsort {d: ds) xs = concatMap {bsort ds) {ptn d xs) 

So far, so good. But we can push the calculation one more step if we exploit a 
simple but important fact, namely that 

map {bsort ds) {ptn d xs) = ptn d {bsort ds x^) 

Informally, this identity asserts that ptn is stable: partitioning a sorted sequence 
yields a collection of sorted buckets. A detailed proof is given in the exercises. 
Assuming it holds, we can continue: 
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concatMap {bsort ds) (ptn d xs) 

= { above stability property } 

concat {ptn d {bsort ds xs )) 

And that gives us Radixsort: 

rsort:: {Bounded b,Enum b, Ord b) ^ [a^ b]^ [a] [a] 

rsort [ ] =xs 

rsort {d: ds) xs = concat {ptn d {rsort ds xs)) 

Whereas bsort sorts on the most significant field first, rsort sorts on the most 
significant field last. The difference is that in bsort the buckets have to be kept 
separate; sorting each bucket means dividing each bucket into further buckets, and 
so on. At the end there will be a long array of singleton buckets, which are only 
then reassembled into the final list. In rsort the buckets can be combined after each 
pass. Indeed, Radixsort was used in the early days of sorting punched cards with 
a mechanical sorter. The sorter was used to divide the cards into buckets on the 
least significant column of a card. The buckets were then carefully reassembled by 
a human sorter into a single deck of cards without changing the order of any of the 
cards in a single bucket or the order of the buckets themselves. The entire deck was 
then replaced in the sorter and sorted again on the next least significant digit, and so 
on. A much simpler process for a human to carry out. 

Let us now return to the problem of computing ptn efficiently. To avoid a po¬ 
tentially huge range of values, most of which will probably not occur in a given 
field, it is best to make the range {I, u) of values in a field explicit. We will assume 
for simplicity that all fields have the same range. Second, we suppose that field 
elements are of a type that can be array indices, which is to say they are elements of 
the type class Ix. Then we can avoid multiple traversals of the input by accumulating 
the elements into an array: 

ptn ::Ixb ^ {b,b) {a ^ b) ^ [a] [[a]] 

ptn {I, u) dxs = elems xa 

where xa = accuniArray snoc [ ] (/, u) {zip {map d xs) xs) 
snoc xs x = xs-W- [x] 

It is important for the stability property above that ptn should ensure that the order 
in each bucket is the same as in the input, which explains why array entries are 
computed by adding a new value to the end of a list via snoc. But snoc is not a 
constant-time operation, so building the array can take 0(n^) steps in the worst case, 
when all the elements go into the same slot. One solution is to use symmetric lists; 
another is to insert elements in reverse order, and then to reverse each list when 
extracting the array elements: 
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ptn {I, u) d xs = map reverse {elems xa) 

where xa = accumArray (flip (:)) [] (/,m) (zip (map dxs) xs) 

This version of ptn takes &{n) steps for an input of length n. Here is the revised 
definition of rsort that uses the new ptn: 

rsortwixb ^ (b^b) —)■ [a —)■ fr] ^ [a] ^ [a] 
rsort bb[\xs =xs 

rsort bb (d: ds) xs = concat (ptn bb d (rsort bb ds xs)) 

If there are k discriminators, then rsort takes &(kn) steps because there are k calls 
of ptn, each of which takes &(n) steps, and one application of concat, which also 
takes linear time. 

Finally, let us specialise rsort to the case where the input is a nonempty list of 
natural numbers. Each number is converted into a list of digits by applying show, 
and then padded with leading zeros to ensure that each decimal has the same length. 
We can define fhe discriminafor funcfions by 

discs:: [Nat] —)■ [Nat —)■ Char] 

discs xs = [Xx.pad k (show x) !! /1 / •(— [0 .. k — 1 ] ] 

where k = maximum (map (iength ■ show) xs) 
pad kxs = repiicate (k — iength xs) ' 0 ' +|-X 5 

And now we have 

irsort:: [Nat] —)• [Nat] 

irsort xs = rsort ('O', ' 9') (discs xs) xs 

How does irsort compare wifh fhe currenf champion, Smoofh Mergesorf? As one 
experimenf, we sorted a lisf of a million randomly generated integers, all in fhe 
range (0,10000) - so fhere are five discriminators. Radixsorf fook 65% of fhe fime 
of Smoofh Mergesorf. In facf, Smoofh Mergesorf only begins to pull ahead when 
fhere are eighf or more discriminators. 


5.5 Sorting sums 

We end the chapter by taking a look at a famous unsolved problem connected with 
sorting. Suppose A is some ordered type and + is some monotonic binary operation 
on A, so 

x^xf Ay ^ x+y^x'+y' 

In concrete code we take A to be a synonym for Integer and + to be numerical 
addition. Consider the problem of computing sortsums, where 

sortsums:: [A] —)■ [A] —)■ [A] 

sortsums xs ys = sort [x+y]x ^ xs,y ^ys] 
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Supposing both lists have length n, what is asymptotically the best possible running 
time of any algorithm for computing sortsumsl There are sums, so sorting 
involves Q.{n^ log n) comparisons on elements of A in the worst case. The bound 
Q.{n^ log n) does not depend on + being monotonic but, even if it is, the bound is 
the same. We will prove this fact below. 

But now suppose we assume more about + and A, specifically that (A, +) is an 
Abelian group. Thus + is associative and commutative, with an identity element we 
will write as 0, and an operation negate such that x + negate x = 0. For example, the 
integers form an Abelian group under addition. What is the best bound in this case? 
The answer is that nobody knows. It cannot be better than 0{n^) because it takes n'^ 
steps to produce the answer, but there is still a gap between 0{n^) and 0{n^ log n). 

What if additional properties of A were assumed? After all, integers have more 
structure than just being an Abelian group under addition. Integers can be multiplied 
as well as added - they form an algebraic ring. Does that help? Again, nobody 
knows. It remains an open problem, some 40 years after it was first posed in [6], as 
to whether the total cost of computing sortsums can be reduced to 0{n^) steps. 

However, some progress has been made. In particular, Jean-Luc Lambert [9] 
proved that, if (A,+) is an Abelian group, then sortsums can be computed with 
0{n^) comparisons between elements of A. His algorithm is another nifty example 
of divide and conquer, and we describe it below. However, Lambert’s algorithm 
does require Cn^ log n additional operations, including other ‘housekeeping’ com¬ 
parisons; moreover C is quite large. Thus the total running time does not beat the 
0{n^ log n) bound. 

Here is a proof that Q.(n^ log n) is a lower bound on sortsums when the only 
assumption made is that + is monotonic. Suppose xs and ys are both sorted into 
increasing order and consider the n x n matrix [ [x + y | y ■(— y^] | x t— x^] . Each row 
and column of the matrix is therefore in increasing order. The grid is of the same 
kind that we saw in two-dimensional search in the previous chapter. The matrix 
is an example of what is known as a standard Young tableau, and it follows from 
Theorem H of Section 5.1.4 of [8] that there are precisely 


E{n) = (n^)\ 


lf(ln-l)\ (2n-2)l 


n! 

0 ! 


ways of assigning the values 1 to n^ to the elements of the matrix. Each such 
assignment determines a potential permutation that can sort the input, so in the 
associated decision tree there have to be at least E(n) leaves. Using the fact that 
log E{n) = Q. {n^ log n), we conclude that at least this number of comparisons 
between elements of A is required. 

Now for the meat of the exercise. Lambert’s algorithm depends on two simple 
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facts. Define the subtraction operation {—)hyx — y = x + negate y. Then we have 
x+y = x —negate y (5.1) 

and 

x — y^x'—y' x — x'^y — y' (5.2) 

Verification of (5.1) is easy, but (5.2), which we leave as an exercise, requires all 
the properties of an Abelian group. Here in outline is how (5.1) and (5.2) are used. 
First, we use fact (5.1) to sort subtractions rather than sums: 

sortsums xs ys = sortsubs xs (map negate ys) 
sortsubs xsys = sort [x — y \ x ^ xs,y ^ ys] 

Second, we use fact (5.2) to compute sortsubs xs ys by computing instead the 
two lists xxs = sortsubs xs xs and yys = sortsubs ys ys. By using a divide-and- 
conquer scheme, Lambert showed how the two lists xxs and yys can be com¬ 
puted with only 0{n^) comparisons between elements of A. The two lists can 
be merged to give sortsubs xs ys - but crucially, without any further compar¬ 
isons on elements of A. Since x — y ^ x' —y' precisely in the case that x — x' 
precedes y — y' in the merged list, the merged list can be computed using prece¬ 
dence comparisons only, comparisons between suitable integer labels, not elements 
ofA. 

Let us deal first with the merging step. We label values of type A with natural 
numbers and change the definition of sortsubs to read 

sortsubs xs ys = mapfst {sortWith abs xis yis) 
where xis = zip xs [0 . .n — \ ] 
yis =zipys [n..] 
n = length xs 

abs = merge {sortsubs\ xis) {sortsubs\ yis) 

The elements of the two lists xs and y^ are given distinct labels, from 0 upwards. 
With Label a type synonym for natural numbers, and Pair a synonym for pairs of 
labels, the two component sorting functions in the new definition of sortsubs have 
types 

sortsubs \:: \{A,Label)] —)■ [{A,Pair)] 

sortWith:: \ {A,Pair)] —)■ [{A,Label)] —)■ \{A,Label)] —)■ [{A,Pair)] 

The first function, sortsubsi, is defined in the first instance by 

sortsubs\ xis = sort {subs xis xis) 

subs xis yis = [{x — y, {i,j)) \ {x, i) ■(— xis, {y,j) ■(— yis] 

Thus sortsubsi sorts subtractions over a single labelled list, retaining label infor¬ 
mation to show the origin of the subtraction. For example, the element (6, (11,3)) 
records that 6 is the result of subtracting the element in position 3 in xis from the 
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element in position 11. As defined, sortsubs\ takes 0{n^ log n) steps. Below we 
show how this ean be redueed to steps. 

We deal with sortWith next. The definition, whieh uses both the Haskell function 
sortBy and an array, is as follows: 

sortWith abs xis yis = sortBy cmp {subs xis yis) 

where cmp {_,{i,k)) (_, (/,/)) = compare {a ! {i,j)) (a ! {k,l)) 
a = array bs {zip labelPairs [0..]) 
labelPairs = map snd abs 
bs = {minimum labelPairs, maximum labelPairs) 

Now consider 

abs = merge {sortsubs\ xis) {sortsubsi yis) 

Here mapfst abs is a list of A-values in nondecreasing order, and map snd abs is 
a list of corresponding pairs of labels. This second list is just what we need to 
determine the result of a comparison test, for Xi — yk ^ Xj —yi if and only if {i,j) 
precedes {k, 1) in the list labelPairs. We compute precedence information quickly 
by creating an array indexed by pairs of labels whose entries are positive integers. 
Although sortWith depends on comparisons, the comparisons are between pairs of 
labels, not elements of A. 

It remains to compute a better version of sortsubs\, and it is here that divide and 
conquer enters the picture. We make use of the identity 

sortsubsi {xis Pi- yis) {xis p-yis) = 

merge {merge {sortsubsi xis xis) {sortsubsi xis yis)) 

{merge {sortsubsi yis xis) {sortsubsi yis yis)) 

Two of the subterms, sortsubsi xis xis and sortsubsi yis yis, are computed recur¬ 
sively, and the results are merged to give abs. Next, sortsubsi xis yis is computed 
via 

sortsubsi xis yis = sortWith abs {subs xis yis) 

Finally, sortsubsi yis xis can be computed quickly from sortsubs xis yis by negating 
values, swapping labels, and reversing the list. The complete program is summarised 
in Figure 5.1. Counting comparisons between elements of A only, the number 
C{n) of comparisons to compute sortsubsi on a list of length n satisfies C{n) = 
2C{n/2) + &{n^) with solution C{n) = &{n^). However, the total time T{n) taken 
to carry out the complete sorting is given by T{n) = 2T{n/2) +&{n^ log n), with 
solution T{n) = &{n^ log n). The logarithmic factor can be removed from T{n) if 
sortBy cmp in the definition of sortWith can be computed in quadratic time, but this 
result remains elusive. In any case, the additional complexity arising from replacing 
comparisons between elements of A by comparisons between integers makes the 
algorithm very inefficient in practice. 
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sortsums xs ys = sortsubs xs {map negate ys) 
sortsubs xs ys = mapfst {sortWith abs xis yis) 
where xis = zip xs [0. .n — \ ] 
yis =zipys [n..] 
n = length xs 

abs = merge {sortsubs\ xis) {sortsubsi yis) 

sortWith abs xis yis = sortBy cmp {subs xis yis) 

where cmp {_,{i,k)) (_,(/,/)) = compare {a ! {i,])) {a ! {k,l)) 
a = array bs {zip labelPairs [0..]) 
labelPairs = map snd abs 

bs = {minimum labelPairs, maximum labelPairs) 
subs xis yis = [{x — y, {i,j)) \ {x, i) t— xis, {y,j) t— yis] 

sortsubsi [] = [] 

sortsubsi [(x, 1)] = [(0,(1,/))] 

sortsubsi xis = merge abs {merge cs ds) 

where abs = merge {sortsubsi xisi) {sortsubsi xisz) 
cs = sortWith abs xisi xis 2 
ds = reverse {map switch cs) 

{xisi,xis 2 ) = splitAt {lengthxis div 2) xis 

switch {x, {i,j)) = {negate x, (j, 1)) 


Figure 5.1 The complete program 


5.6 Chapter notes 

The ultimate source of information about sorting is Knuth’s comprehensive treatment 
in [8]. Quicksort was invented by Tony Hoare, see [7]. Mergesort is even older 
and goes back to John von Neumann, who first suggested the method in 1945. 
The Haskell implementation of Mergesort is attributed to Lennart Augustsson and 
Thomas Nordin. Heapsort was discovered by J. W. J. Williams [10], although an 
earlier version, simply called Treesort, was due to Robert W. Floyd. The derivation 
of Radixsort is due to Jeremy Gibbons [5]. 

The problem of sorting sums is Problem 41 in the Open Problems Project [2], 
a web resource devoted to recording open problems of interest to researchers 
in computational geometry and related fields. The earliest known reference to 
the problem is by Michael Fredman [4], who attributed the problem to Elwyn 
Berlekamp. All these references consider the problem in terms of numbers rather 
than Abelian groups, but the idea is the same. 
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Exercises 


Exercise 5.1 Four species of tree have been described in this chapter. Name the 
sorting algorithms associated with the following five tree structures, identifying the 
odd one out: 


data Tree a = Null 
data Tree a = Null 
data Tree a = Null 
data Tree a = Null 
data Tree a = Leaf 


I Leaf a \ Node {Tree a) {Tree a) 
I Node a {Tree a) {Tree a) 

I Node a [Tree a] 

I Node {Tree a) a {Tree a) 
a I Node [Tree a] 


Exercise 5.2 Starting with qsort = flatten ■ mktree, flatten is as defined in fhe 
fexf, consfrucf fhe definifion of qsort {x:xs). 

Exercise 5.3 Recall from fhe previous chapter fhe following definifion of flatten in 
which fhe concafenafion operations have been eliminated: 

flatten t =flatcat t [ ] 

flatcat Null xs = xs 

flatcat {Node Ixr) xs = flatcat 1 {x: flatcat rxs) 

Starting with qsort = flatten ■ mktree, synthesise a version of Quicksort for this 
version of flatten. 

Exercise 5.4 Would it be sensible to choose as pivot the middle element of the list, 
that is, the one whose position is the middle position? How about the mean value, 
or the median? 

Exercise 5.5 Define minimum = head ■ qsort. How long does if lake lo compute fhe 
minimum of a nonempty lisf fhis way? 


112 


Sorting 


More generally, define select kxs = {qsort xs) !! k. Synthesise a more efficient 
definition. (This exercise will he answered in the following chapter.) 

Exercise 5.6 Synthesise a space-efficient version of qsort by introducing two accu¬ 
mulating parameters: 

qsort {x : xs) = help x [ ] [ ] 

where help xxs us V5 = qsort (n^-H-y^) -H- [x] -Vr qsort (v^-H-z^) 
where (y5,z5) = partition (<x) xs 

Your task is simply to obtain a recursive definition of help. 

Exercise 5.7 The number of comparisons T{m,n) required by merge to merge two 
lists of lengths m and n in the worst case satisfies 

r(0,n) =0 
r(m,0) = 0 

T{m,n) = I + T{m — \ ,n) max T{m,n—\) 

Prove that T{m,n) ^m + n. 

Exercise 5.8 In the recursive definition of msort, are the two base cases both 
necessary? 

Exercise 5.9 Synthesise the bottom-up version of Mergesort. You will need the 
following fusion law of until: 

f • untilp g = untilqh-f 

provided that/ ■ g = h-f and px4^ q (f x) for all x. 

Exercise 5.10 What common functions do the following two expressions represent? 

flip {foldl (A/ X. (x:) •/) id) [] 
flip (foldr (Ax/./ - (x:)) id) [] 

Exercise 5.11 A playing card can be represented by two characters, the first being 
one of the letters of SHDC (Spades, Hearts, Diamonds, Clubs) and the second one of 
the letters of AKQJT98765432 (Ace, King, Queen, Jack, Ten, etc.). Bridge players 
sort their 13 cards from left to right in the order given by these lists. For example, 

[SA,SQ, S9, S8, S2,HK,H5,H3,H2,CA, CT, C7, C2] 

Bridge players would describe this hand as “Five spades to the Ace-Queen, four 
hearts to the King, void diamonds and four clubs to the Ace-Ten”. Be that as it may, 
use sortBy to sort a Bridge hand into order. 

Exercise 5.12 Using the Haskell Data.Ord function 

comparing:: Ord {a ^b)^a^a^ Ordering 
comparing/xy = compare (f x) if y) 
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give a simple definition of sortOn. Explain why this definition can be inefficient, 
and use tupling to give a better version. 

Exercise 5.13 Give a divide-and-conquer algorithm for building a size-balanced 
heap in linearithmic time. 

Exercise 5.14 Given that we can build a heap in linearithmic time, it seems that 
another way to define Heapsort is simply to define 

hsort = flatten ■ mkheap 

In other words, build a heap and then flatten it. Show the result of eliminating the 
intermediate heap from this definition. Why did we not include this version of hsort 
in the text? 

Exercise 5.15 Show how to build a size-balanced tree, of the kind described in 
Heapsort, in linear time. Start with 
mktree [ ] = Null 

mktree {x'.xs) = Node x {mktree {take mxs)) {mktree {drop mxs)) 
where m = length xs div 2 

Exercise 5.16 Prove that a size-balanced tree, of the kind described in Heapsort, 
has minimum possible height [log (n -f 1)], where n is the size of the tree. 

Exercise 5.17 Suppose you are given a list of n integers, all of which lie in some 
given range (0,m). Show how to sort the list in @{m + n) steps. 

Exercise 5.18 Here is the stability property of Bucketsort again: 

map {hsort ds) {ptn d xs) = ptn d {hsort ds xs) (5.3) 

The aim of this exercise is to prove (5.3) using the original definition of hsort as a 
function that flattens a tree. First, define the function 

tmap :: (a —)■ fi) —)■ Tree a —)■ Tree h 

for mapping a function over a tree. This function will be needed below. 

Next, use the following subsidiary claims to prove (5.3): 

mktree ds -filterp = tmap {filterp) ■ mktree ds (5.4) 

flatten ■ tmap {filterp) = filterp -flatten (5.5) 

Next, use the following claim to prove (5.4): providedp is a total predicate, we have 
ptn d-filterp = map {filterp) -ptn d (5.6) 

Now prove (5.6). You can assume that filters of total predicates commute, so 
filter p -filter q = filter q -filter p 

Finally, prove (5.5). That’s a lot of work for one exercise, but it does demonstrate 
that the pages of explanation found in some textbooks as to why Radixsort works 
can be reduced to simple calculation. 
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Exercise 5.19 Specialise Radixsort to the problem of sorting a list of words, where 
a word is made up of lower-case letters only. 

Exercise 5.20 Prove that, if (A, -f) is an Abelian group, then 
x — y^x'—y' 44 x — x'^y—y' 


Answers 

Answer 5,1 Mergesort, Heapsort, odd one out. Quicksort, and Bucketsort. 
Answer 5.2 We have 

qsort {x : xs) 

= { definition } 

flatten {mktree {x'.xs)) 

= { definition of mktree with {ys,zs) = partition (<v) xs } 

flatten {Node {mktree ys) x {mktree zs)) 

= { definition of flatten } 

flatten {mktree ys) -H- [x] -fj-flatten {mktree 
= { definition of qsort } 

qsort ys -H- [x] -H- qsort zs 

Answer 5.3 Let qcat xs ys =flatcat {mktree xs) ys. Then we obtain 

qsort xs = qcat xs [ ] 
qcat [ ] y5 =ys 

qcat {x: xs) ys = qcat us {x: qcat V5 ys) 

where {us, vs) = partition {<x) xs 

Answer 5.4 No, choosing the middle element is as likely to produce two unbal¬ 
anced lists as choosing the first element. The only advantage of such a choice is 
that if the input is already sorted into strictly increasing order, then qsort will take 
&{n log n) steps rather than &{n^) steps. Choosing the mean value as pivot is better, 
but of course the notion of a mean value depends on the input being a list of numbers, 
so this method is not available for an arbitrary ordered type. Choosing the median 
value as pivot would guarantee the list is evenly split, but of course such a choice 
depends on there being an efficient method for computing the median, a problem 
that is addressed in the following chapter. 

Answer 5.5 It could take quadratic time, for example when the input is in decreas¬ 
ing order. 
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Answer 5.6 The base ease 

help x[\usvs = qsort us +|- [x] +|- qsort V5 
is easy. For the reeursive ease assume y<x. We argue as follows: 

help X (y: xs) us V5 
= { with (y5,z5) as above } 

qsort {us ^y: ys) TF [x ] 4f qsort {vs -H- zs) 

= { elaim (see below) } 

qsort {y: us^ys) +)- [x] -{j-qsort (v^+l-z^) 

= { definition of help } 

help X {y: us) V5 

The claim is that qsort is unchanged when the input is permuted. A dual calculation 
in the remaining case yields 

qsort[\ =[] 

qsort (x: x^) = help X5 [ ] [ ] 

where help [ ] V5 = qsort TF [x] -H- qsort V5 

help (y: x^) V5 I X ^ y = help xs us (y: v^) 

I otherwise = help xs (y: us) V5 

Answer 5.7 The base cases T{0,n) = 0 ^ n and T{m,0) = 0 are immediate 
and the induction step is 

l+r(m—l,n) max T{m,n — l) ^ l + (m—1+n) max {m + n—l)=m + n 
In fact one can prove rather more: for m > 0 and n > 0 we have T{m,n) =m + n—l. 

Answer 5.8 Oh yes. Since halve [x] = ([], [x]) the recursion would not terminate 
if either clause were missing. It is an easy mistake to make. The first two clauses of 
sortsubsi in Figure 5.1 are necessary for the same reason. 

Answer 5.9 Arguing at the function level for a nonempty list, we have 

flatten ■ unwrap ■ until single {pairWith Node) ■ map Leaf 
= { since flatten ■ unwrap = unwrap ■ map flatten } 

unwrap ■ map flatten ■ until single {pairWith Node) ■ map Leaf 
= { fusion law of until (see below) } 

unwrap ■ until single {pairWith merge) ■ map (flatten ■ Leaf) 

= { since flatten ■ Leaf = wrap } 

unwrap ■ until single {pairWith merge) ■ map wrap 
The two fusion conditions are 
single ■ map flatten = single 

map flatten ■ pairWith Node = pairWith merge ■ map flatten 
Verification of the conditions is omitted. 
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Answer 5,10 In both cases the function is reverse. Of course, a shorter definition 
isfoldl (flip (:)) []. 

Answer 5.11 We have to define a comparison funcfion cmp for playing cards. Wifh 
suit as a synonym for head, and rank a synonym for head ■ taii, and noting that 
“SHDC” is in reverse alphabetical order, we can define 

cmp c\ C 2 = if suit ci == suit C 2 

then compare (posn (rankci)) (posn (rankc 2 )) 
else compare (suit C 2 ) (suit c\) 
posn r = head [/1 (c, i) t— zip ranks [0.. ], c == r] 
ranks = "AKQJT98765432" 

Answer 5.12 We can define 

sortOnf = sortBy (comparing f) 

However, the value of / on one and the same argument may be computed many 
times under this definition. A better method is to define 

sortOnf xs = map snd (sortBy (comparingfst) (zip (mapfxs) xs)) 

Even this definition is not as good as it might be because the list xs is traversed 
twice in the final term. A better definition still is 

sortOnf = map snd ■ sortBy (comparingfst) ■ map (Xx. (f x,x)) 

This is essentially the definition given in Data.List. 

Answer 5.13 The function mkheap is defined by 

mkheap :: Ord a ^ [a] —)■ Tree a 
mkheap [] =Nuii 

mkheap (x: xs) = Node y (mkheap ys) (mkheap 
where (y,ys,zs) = spiit (x'.xs) 

spiif.'. Ord a ^ [a] —)• (a, [a], [a]) 
spiit (x: xs) =foidr op (x, [],[]) 

where opx (y,ys,zs) | x ^ y = (x,y:zs,ys) 

I otherwise = (y,x:zs,ys) 

The function spiit returns three components. The first component is the smallest 
element in the input, while the other two are lists, of as equal a size as possible. 

Answer 5.14 Eliminating the tree gives 
hsort [ ] = [ ] 

hsort xs = y: merge (hsort ys) (hsort zs) 
where (y,ys,zs) = spiit xs 

But this is just another version of Mergesort. 
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Answer 5.15 The method is similar to the one used in Mergesort. Define 

mkpairy.Nat ^ [a\ —)• {Tree a, [a]) 
mkpair nxs = {mktree {take n xs), drop n xs) 

mktree xs =fst {mkpair {length xs) xs) 

This time we obtain 

mkpair Oxs ={Null,xs) 
mkpair n {x : xs) = {Node xlr, zs) 
where {l,ys) = mkpair m xs 

{r,zs) = mkpair {n — l—m)ys 
m= {n—\) div 2 

Answer 5.16 A size-balaneed tree of size n has height H{n), where H{Q) = 0 and 
H{n + 1) = l+H{\n/2\). The solution to this recurrence is H{n) = [log(n + 1)]. 
The induction step follows from [log(n + 2)] = 1 + [log([n/2] +1)], whose proof 
is another application of the rule of ceilings. 

Answer 5.17 The answer is simply to count the number of times each element 
occurs. An array can be used to store the count of each element, and the final sorfed 
oufpuf can fhen be read off from fhe array: 

csortv.Nat ^ [Int] —)■ [Int] 

csort mxs = concat [ replicate kx \ {x,k) assocs a ] 

where a = accumArray (+) 0 (0,m) [(.r, 1) | x ■(— xs] 

This is Counfsorf, and lakes 0(m + n) steps. If was also presented in Section 3.3. 
Answer 5.18 We have 

tmapf {Leaf x) = Leaf (f x) 

tmapf {Node ts) = Node {map {tmapf) ts) 

Here is fhe proof of (5.3): 

map {bsort ds) {ptn d xs) 

= { definition of ptn } 

[bsort ds {filter ((== m) ■ d) xs) \ m t— rng] 

= { definition of bsort } 

[ (flatten ■ mktree ds -filter ((== m) ■ d)) xs \ mrng] 

= { assumptions (5.4) and (5.5) } 

[ (filter ((== m) ■ d) -flatten - mktree ds) xs\m^ rng] 

= { definition of bsort } 

[filter ((== m) - d) {bsort ds xs) [m-^ rng] 

= { definition of ptn } 

ptn d {bsort ds xs) 
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The proof of (5.4) is by induction over the discriminators. Unlike the proof above, it 
is carried out in a point-free style. It is easy to show 

mktree [ ] - filter p = tmap (filter p) ■ mktree [ ] 
and that establishes the base case. For the induction step we reason 

mktree (d: ds) - filter p 
= { definition of mktree } 

Node - map (mktree ds) - ptn d -filter p 
= { assumption (5.6) } 

Node - map (mktree ds-filter p) - ptn d 
= { induction } 

Node - map (tmap (filter p) - mktree ds) -ptn d 
= { definition of tmap } 

tmap (filter p) - Node - map (mktree ds) -ptn d 
= { definition of mktree } 

tmap (filter p) - mktree (d: ds) 

For the proof of (5.6) we reason 

map (filter p) (ptn d xs) 

= { definition of ptn } 

[filterp (filter ((== m) - d) xs) \ m ■(— rng] 

= { filters of total predicates commute } 

[filter ((== m) -d) (filterp xs) \ m ^ rng] 

= { definition of ptn } 

ptn (filter p xs) 

Finally, we prove (5.5) by induction over the tree. The induction step is 
flatten (tmap (filter p) (Node ts)) 

= { definition of tmap } 

flatten (Node (map (tmap (filterp)) ts)) 

= { definition of flatten } 

concat (map (flatten - tmap (filterp)) ts) 

= { induction } 

concat (map (filterp -flatten) ts) 

= { claim; see below } 

filterp (concat (map flatten ts)) 

= { definition oi flatten } 

filterp (flatten (Node ts)) 

The claim is that 

concat - map (filter p) = filter p - concat 
But enough is enough, and we omit the proof. 
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Answer 5.19 One way: 

wsort:: [Worr/] —)■ [IVorr/] 
wsort [ ] = [ ] 

wsortxss = rsort (' a', '-z' ) dsxss 

where ds = [Xxs. {xs 4f repeat ' a') !! /1 / ^ [0.. ^ — 1 ] ] 
k = maximum [length xs\xs 


Answer 5.20 The proof is as follows: 

^ { adding y to both sides } 

(x-y)+y^ (x'-yO+l' 

{ assoeiativity and eommutativity } 
x + {y-y) ^ {y-y’)+x’ 

^ { sinee y — y = 0 and x + 0 = x } 

{ subtraeting x' from both sides } 
(x-x') ^ (y-y') 




Chapter 6 


Selection 


This chapter describes four related problems in which a divide-and-conquer strategy 
can be employed to good effect. All involve selection of some kind, whether from 
one set, two sets, or the complement of a set. The primary example is the problem 
of selecting the kth smallest element in a set, where k can be anything from 1 to n, 
the size of the set. Finding the minimum element (the 1st smallest) or the maximum 
element (the nth smallest) are special cases, as is the problem of finding the median 
element (roughly, the [n/2jth smallest, but see below). We shall also consider the 
problem of selecting the kth smallest in the union of two given sets, and finding the 
smallest number not in a given set of natural numbers. Efficient solutions to these 
problems depend on how the set is represented, whether by a list, with or without 
duplicates, by an array, or by a tree of some kind. 


6.1 Minimum and maximum 

By way of a warm-up, let us start with the problem of computing the minimum and 
maximum elements of a finite nonempty list. The standard definitions are 

minimum, maximum:: Ord a ^ [a] ^ a 
minimum =foldrl min 
maximum =foldrl max 

The prelude function foldrl , and its companion foldll , are fold functions for 
nonempty lists: 

foldrl,foldll :: (a —)■ a —)■ a) —)■ [a] —)■ a 

foldrl f[x] =x 

foldrl f {x:xs) = f x (foldrl f xs) 

foldll f (x: xs) =foldlf x xs 

For example. 
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foldrl (0) [w,x,y,z] = w0 (x0 ( 3 ^ 0 z)) 
foldll (0) [w,x,y,z\ = ((w 0 x) 03 ^) 0 z 

The definition of minimum and maximum uses a foldrl, which processes the list 
from right to left; we can also use & foldll and process from left to right because min 
and max are associative operations. In either direction there are n — 1 evaluations of 
min or max for a list of length n. Each evaluation involves a single comparison, so 
there are n — \ comparisons in total. This is the best one can achieve. Think of an 
algorithm for determining the maximum as a tennis tournament between n players, 
in which the outcome of a single match corresponds to the outcome of a single 
comparison. Every player apart from the eventual winner must lose a match, so 
there must be n — 1 matches. 

If we want both the minimum and maximum elements, then 2n — 2 comparisons 
are certainly sufficient. The obvious method involves two passes over the input, but 
it can be reduced to one pass by making use of the tupling law 

(foldrfx e\ xs,foldrf 2 ei xs) =foldrf ( 01 , 02 ) xs 

where/X {y,z) = (fi xy,f 2 xz) 

Although it is not true in general that 

foldrl f (x : xs) =foldrf x xs 

we do have 

minimum (x : xs) =foldr min x xs 
maximum (x: xs) =foldr max x xs 

because both min and max are commutative as well as associative. It follows that 

minmax:: Ord a^[a]^ (a,a) 
minmax (x:xs) =foldr op (x,x) xs 

where op x (y,z) = (min x y, max x z) 

The number of comparisons can be reduced by rewriting op, using the fact that the 
current minimum is never greater than the current maximum: 

minmax:: Ord a ^ [a] —)■ (a,a) 
minmax {x:xs) =foldr op (x,x) xs 

whereopx (y,z) I x<y = (x,z) 

\z<x =iy,x) 

I othorwiso = (y,z) 

At each step the current minimum and maximum are updated according to whether 
the new element is smaller than the current minimum, larger than the current 
maximum, or in between. In the worst case there are two comparisons at each step, 
giving 2n — 2 comparisons in total, the same number as before. However, in the best 
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case there are only n — \ comparisons. Examples of best-case and worst-case inputs 
are set as exercises. 

Here is a divide-and-conquer algorithm for the same problem: 
minmax [x\ = (x,x) 

minmax [x,y] = if x ^ y then (x,y) else {y,x) 
minmaxxs = {min ai a 2 ,maxbi ^> 2 ) 

where {ai,bi) = minmaxys 
(a 2 ,^ 2 ) = minmax zs 
= halve xs 

The function halve was defined in Section 5.2. The minimum and maximum element 
in a singleton list coincide, so the answer can be computed with zero comparisons. 
Otherwise, the input is divided into two equal halves, the results for both halves 
are computed recursively, and the final answer is obtained by comparing the two 
minimums and the two maximums. The case of a doubleton list is treated separately 
simply because the total number of comparisons can be reduced from two to one in 
this case. 

The total running time of this version of minmax goes up to 0(n log n) steps, 
which hardly seems to make it worthwhile. But, counting comparisons only, the 
number C{n) of comparisons satisfies the recurrence relation 

C(1)=0 
C(2) = 1 

C(n)=C(K2j)+C(rn/2l)+2 

This recurrence is difficult to solve exactly, though it is easy enough to show that 
C{n) = 3n/2 — 2 when n is a positive power of two. Thus the divide-and-conquer 
algorithm can save a quarter of the comparisons. 

In fact, the divide-and-conquer algorithm is not the best possible. For example, 
C(12) = 18 but the minimum and maximum of 12 elements can be computed with 
only 16 comparisons. To see why, here is an algorithm for minmax obtained by 
switching to a bottom-up scheme: 

minmax = unwrap ■ until single {pairWith op) ■ mkPairs 

where op ( 02 ,^ 2 ) = {minai a 2 ,maxbi ^ 2 ) 

pairWith f [ ] = [ ] 

pairWithf [x] = [x] 

pairWithf {x:y:xs) =fxy:pairWithf xs 
mkPairs [ ] = [ ] 

mkPairs [x] = [(x,x)] 

mkPairs {x:y: xs) = if x ^ y then (x, y) : mkPairs xs else (y, x): mkPairs xs 
With C{n) as the comparison count, we have 
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C(1) = 0 
C(l) = l 

C{n) = [n/2\ +D{\n/2\) 

where \n/2\ aeeounts for the number of eomparisons to eompute mkPairs and 

D ( 1)=0 

D{n)=2[n/2\+D{\n/2-\) 

It is easy enough to show thatD(n) = 2 (n — 1), so C{n) =n+\n/2~\ —2. Inpartieular 
C(12) = 16. In fact, C{n) is also a lower bound on the number of comparisons 
needed to compute the minimum and maximum of n elements in the worst case (see 
Answer 6.7). Moreover, the total running time of minmax is &{n) steps. However, 
the bottom-up algorithm is not recommended as a method for computing minmax 
because the other constants involved are larger than with the naive version. 


6.2 Selection from one set 

By definition, the ^th smallest element of a set (or a list without duplicates - see 
Exercise 6.1) with n elements has exactly k—\ elements smaller than it, so the 1st 
smallest is the smallest, and the nth smallest is the largest. The median m of a set 
with an odd size n is an element of the set for which there are exactly \n/2\ elements 
smaller than m and [n/2j elements greater than m. When n is even there is a choice 
of definition, and we pick the one that has [n/2j — 1 smaller elements and [n/2j 
larger elements. This defines what is sometimes called the lower median. Combining 
the two cases, we have that the median of a set of size n has [(« + 1)/2J — 1 smaller 
elements and [(n + l)/2] — 1 greater elements. 

When a set is represented by a list with no duplicates, the kth smallest appears at 
position k — 1 in a sorted list. Hence we can define 

select :: Ord a ^ Nat —)■ [a] —)■ a 
select kxs = (sort !! (k — 1) 

In particular, 

median xs = select k xs where k = {length X5 + 1) div 2 

The definitions of select and median make sense when xs contains duplicates, and 
in what follows we will not assume that the elements of xs are all different. Since 
sorting a list takes 0{n log n) steps, the running time of select is 0{n log n) steps. 
This is an upper bound rather than an exact bound because, under lazy evaluation, 
the whole list may not have to be sorted in order to retrieve the ^th element - it 
depends on the sorting algorithm. On the other hand, the running time is Q.{n) steps 
because it takes this time just to inspect every element of the input. Can these lower 
and upper bounds be made to coincide, that is, can select be computed in linear 
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time? The answer is yes, and the algorithm is yet another elever example of divide 
and eonquer. 

First, though, let us replace the function sort in the definition of select by a 
modified version of qsort, the Quicksort algorithm described in the previous chapter: 

qsort [] = [] 

qsort xs = qsort Tf V5 Tf qsort ws 

where {us,vs,ws) = partitions (pivot xs) xs 
Here pivot chooses some element of a list as the pivot and partitions, whose defini¬ 
tion was given in Answer 4.10, splits a list into three lists: those elements less than, 
equal to, or greater than the pivot. Splitting a list into three sublists rather than two 
is better whenever duplicate elements may be present. 

We can synthesise a faster version of select by exploiting the following divide- 
and-conquer property of the list-indexing operation: 

(xs S^ys)\\k = if k<n then xsWk else ys\\{k — n) where n = length xs 
Now we can reason for nonempty xs: 

select k xs 
= { definition } 

qsort !! (k — 1) 

= { with {us,vs,ws) = partitions (pivot xs) xs } 

(qsort -H- V5 -H- qsort wi')!! (k — 1) 

= { assuming k — \< length us } 

qsort !! (k — 1) 

= { definition of select } 

select k us 

The other cases can be dealt with in a similar fashion, and we end up with the 
following definition of select: 

select:: Ord a ^ Nat -^\a]^ a 
select k xs 

\k = select k us 

\ k ^ m + n = vs \ \ (k— m—1) 

\k>m + n = select (k — m — n) ws 
where (us,vs,ws) = partitions (pivot xs) xs 
(m, n) = (length us, length vi) 

The middle list vs is not empty because it contains at least one copy of the pivot. Let 
T(n) be the running time of this version of select on an input of length n. Assuming 
partitions splits the list into three lists, the first and third having length at most 
(n — l)/2, we have 

T(n) ^ T((n-l)/2) + &(n) 
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with solution T{n) = 0{n). On the other hand, partitions may return a very unequal 
split, so the running time in the worst case satisfies 

T{n) = T{n-\)+e{n) 

with solution T{n) = If the partitioning uses the median of the list as pivot, 

then there will be an equal split and the result will be a linear-time algorithm for 
select. But that involves finding the median in linear time, which is essentially what 
we are trying to achieve in the hrst place. The idea therefore seems circular and 
without obvious merit. 

However, it can be made to work. The wiggle room is that we do not have to 
choose the actual median as the pivot: any value will do as long as the result of 
partitioning is three lists, the sum of the lengths of the first and third lists being some 
proper fraction of the input. We shall see why later on. The method for choosing 
pivot is a very cunning divide-and-conquer scheme. First, divide the input into 
groups of five by applying group 5, where 

group '.'.Nat —)■ [a] [[a]] 

group n [] = [] 

group nxs = ys: group n zs where {ys,zs) = splitAt n xs 

If the length of the input is not an exact multiple of hve, there will be a trailing 
group of shorter length. For example, 

gronp5[1..12] = [[1..5],[6..10],[ll,12]] 

There are therefore \n/5 \ groups. Computing group takes linear time. 

Next, find the median of each group by sorting each group and taking the middle 
element of the result. That is, compute 

medians = map {middle ■ sort) ■ group 5 

where middle xs = xs\l {{length -|- 1) div 2 — 1) 

For example, medians [1.. 12] = [3,8,11]. Computing medians also takes linear 
time: sorting each group of five elements takes constant time, as does evaluation of 
middle, and there are \n/5\ groups. 

Finally, dehne pivot to select the median of these medians by applying the 
algorithm for select recursively. That gives 

pivot'.'. Ord a^\a]^ a 
pivot [x] = X 

pivot xs = median {medians xs) 

where median xs = select {{length -|- 1) div 2) xs 

The clause pivot [x] = x has to be included as a special case: without it we would 
have pivot [x] = select 1 [x], and computing the right-hand side would involve 
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computing pivot [x] again, so the whole program would spin off into an infinite 
loop. 

To estimate the running time T{n) of select in the worst case, observe that finding 
the median of the medians takes r( [n/5]) + 0(n) steps because pivot calls select 
recursively on a list of length \n/5\. The partitioning step takes 0(n) steps on a 
list of length n. To these times we have also to add in the running time to select 
from either the first or third list returned by partitions. We claim that each of these 
lists has length at most 7 n/10 for an input of length n. To appreciate why, here is a 
picture for a particular input of length 28: 
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The columns are the groups of length 5, except for the last column. Each of these 
groups has been sorted, and the columns have been arranged in increasing order 
of their median. The median m = 29 of the medians has been highlighted. The 
algorithm does not arrange the columns in this way; the picture is there just to 
explain what we can say about how m partitions the list. 

The key property of m is as follows. The bottom-left rectangle with upper-right 
comer at m contains only elements less than or equal to m. The number of elements 
in this rectangle, apart from m, is 

3 [([n/S] +1)/2J -1 ^3n/10 

(see Exercise 6.10). That means the number of elements greater than m is at most 
In/10. By reasoning similarly about the top-right rectangle with bottom-left comer 
at m we have that the number of elements less than m is at most In/10. That means 
that in the worst case select is called recursively on a list of size at most In/10. 
Ignoring floors and ceilings, the total running time therefore satisfies 

T{n) ^ T{n/5) + T{ln/10i)+cn 

for some c. It is easy to show by induction that T{n) ^ bn for some appropriately 
chosen b. For the induction step, we have to show 

bn/5 + Sbn/10 + cn ^bn 


This inequality is immediate on taking b = 10c.ln summary, select can be computed 
in linear time. 
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6.3 Selection from two sets 

Continuing the theme of selection, consider the problem of computing select, where 
this time we define 

select :: Ord a ^ Nat —)■ [a] —)■ [a] —)■ a 
select kasbs = (merge as bs) !! k 

Thus select takes a number k and two sorted lists, and returns the element at position 
k in the merged list. In particular, select 0 as bs returns the smallest element in 
merge as bs. 

How long does the computation of select take? Clearly, if both lists have length n, 
then select takes 0(n) steps because two lists can be merged in this time. However, 
if they are given as two sorted arrays, or two balanced binary search trees, then 
the time can be reduced to 0(log n) steps. The result is a little surprising because 
two arrays or two binary search trees certainly cannot be merged in less than 
linear time. The faster algorithm is yet another example of divide and conquer, 
and the proof that it works hinges on a subtle relationship between merging and 
selection. 

Here is the relationship: provided the arguments of merge are two sorted lists, we 
have 

merge (x^TT [a] Tfy^) (us^ [b] +|-V5) \ \k 

\a^b/\k^p + q = merge (x5 +|- [a] Tf y^) !! k 

\ a ^ b A k>p + q = merge ys (us^ [b] 4f v^)!! (k —p — 1) 
\b^aAk^p + q = merge xs (us Tf 4f vi)!! k 
\ b ^ a A k>p + q = merge (x5Tf [a] Tf y^) V5 !! (k — ^ — 1) 
where p = length xs',q = length us 

The proof is given later on. Using the relationship, it is possible to derive the 
following definition of select: 

select k[\bs = bsV.k 
select k a5 [ ] = as \ \k 

select kasbs\ a ^b Ak ^ p + q = select k as us 

I a ^b Ak> p + q = select (k—p — l)ysbs 

I b ^ a Ak ^ p + q = select k xs bs 

I b ^ a Ak> p + q = select (k — q — 1) as vs 

where p = (length as) div 2 
q = (length bs) div 2 
(xs, a: ys) = splitAt p as 
(us, b:vs) = splitAt q bs 


— line 3 

— line 4 

— line 5 

— line 6 


The derivation is similar to the one involving qsort in the previous section and we 
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won’t go into details. Instead, here is the trace of an example of select in which 
k = 6: 


p q k 

as 

bs 


3 3 6 

[1,4,4,7,8,11,15] 

[2,5,9,11,15,16,20] 

— line 3 

3 1 6 

[1,4,4,7,8,11,15] 

[2,5,9] 

— line 6 

3 0 4 

[1,4,4,7,8,11,15] 

[9] 

— line 4 

1 0 0 

[8,11,15] 

[9] 

— line 5 

0 0 0 

[8] 

[9] 

— line 3 

0 0 0 

[8] 

[] 



The values of a and b have been underlined. The last column gives the line number 
of the recursive call of select used for the next step. The final value of select on the 
two lists is 8. 

Since select chooses the values of a and b to be the middle elements of the two 
lists, half of one of the lists is discarded at each step. Ignoring completely the cost 
of evaluating the local definitions, the running time of select therefore satisfies 

T{m,n) = T{m,n/2) max T{m/2,n) +0(1) 

wifh solufion T{m,n) = 0(log m + log n). Of course, evaluafions of fhe local defini- 
fions lake linear ralher lhan conslanf lime, so fhe Irue liming eslimafe is 

T{m,n) = T{m,n/2) max T{m/2,n) +@{m + n) 

with solution T{m,n) = &{m + n). The divide-and-conquer-algorithm is therefore 
no faster than the naive version we started out with. The pay-off comes when the 
two lists are given as two arrays or two binary search trees. Then we can arrange 
that the cost of the local evaluations is indeed constant. 

We will spell out the details for binary search trees, leaving arrays to the exercises. 
Recall from Chapter 4 that a binary search tree is a tree of type 

data Tree a = Null \ Node Nat {Tree a) a {Tree a) 

with the property that, provided the tree labels are values of an ordered type, flatten¬ 
ing a tree produces a list in ascending order. Recall also that the Nat component of 
a Node contains the height of the tree. However, this time we need to assume that 
the integer is not the height, but the size of the tree. Thus 

size Null = 0 

size {Node slxr) = s 

Of course, installing such information takes time, but we will ignore the cost of 
doing so. With that assumption, the task is to compute 

select :: Ord a +> Nat —)■ Tree a —)■ Tree a^ a 
select kt\t 2 = merge {flatten ti) (flatten t 2 ) Wk 
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The faster algorithm is given by 

select k t\ Null = index ti k 
select k Null t 2 = index t 2 k 
select k {Node hi h a ri) {Node /i 2 hb r 2 ) 

\a^b/\k^p + q = select k {Node hi h ari) h 

\a ^b f\k>p + q = select {k— p — 1) ri {Node /i 2 h b r 2 ) 

\b^aAk^p + q = select k h {Node /i 2 h b r 2 ) 

\b Ak>p + q = select {k — q — l) {Node hi h a ri) r 2 
where p = size li',q = size h 

The funetion index is speeified by 

index t k = flatten tWk 

It is easy to derive 

index {Node - Ixr) k 
\ k<p = index I k 
I k == p = x 

\ k>p = index r {k—p — l) 
where p = size I 

and we leave details as an exereise. 

Now for the tricky part, the proof of the relationship between merging and 
selection. Recall that we have to simplify 

merge (xi+h {a\ Tfy^) {us^ [b] Tfv^) \ \k 

We will assume that a ^b; the case bis entirely dual. Furthermore, let p be the 
length of xs and q the length of us in what follows. 

We will need to make use of two decomposition rules, one for list-indexing and 
one for merging. The decomposition rule for indexing has been used before: 

{xs -ii-ys) \ \ k = if k <n then xsV.k else ys\\{k — n) where n = length xs 

To state the decomposition rule for merge, first define <C= by 

(<C=):: Ord a ^ [a] ^ [a] Bool 
xs<^ys = and [x ^ y | x ■(— xs,y •(— y^] 

Thus xs <C= ys holds if no element of xs is larger than any element of y^. The 
decomposition rule for merge now states that 

merge (x5 -H-y^) {us -H- v^) = merge xs us -H- merge ys V5 

provided xs <C= vs and us <C= y^. For the proof observe that xs <C= y^ because the 
list xs -H- ys is sorted. If xs <C= vs holds as well, then 
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xs <C= merge ys V5 

Similarly, us <C= merge ys vs if us <C= ys. Hence, if both hold, then 

merge xs us <C= merge ys V5 

from which the decomposition rule for merge follows. 

As well as the two decomposition rules, we will need two other observations. 
First, suppose ys = -\\-ys 2 , where is the longest prefix of y^ such that 

X5+I- [a] d+y^i <= [b] +I-V5 

Then we claim us<^=ys 2 . Either ys 2 is empty, in which case the result is immediate, 
or its first element is greater than b, which means it is greater than any element in 
us. As a consequence, we have 

merge {xs-h\-[a] d+y^'i -+\-ys 2 ) {us-H-[b]-{j-vs) 

= merge (x5+|- [a] d-hy.?!) usmergeys 2 {[b] +I-V 5 ) 

by the decomposition rule for merge. 

The second observation is dual. Suppose us = us\ -H-us 2 , where us 2 is the longest 
suffix of us such fhaf 

xs +1- [a] <C= US 2 +1- [b] Tf vs 

fhen us I <^=ys. Thaf means 

merge {xs Tf [a] Tl-y^) {usi Tf US 2 Tf [b] +|- v^) 

= merge {xs Tf [a]) us\ Tf merge ys {us 2 Tf [b\ Tf vi) 

We are now ready for fhe main calculafion. Assume firsf fhaf k^p + q. We reason 

merge (x5+|- [a] d+y.s') (n^dT [b] dd-v^)!! 

= { choose y^i and ys 2 as above } 

merge (x^dd- [a] dd-y.s'j dd-y.s' 2 ) (n^dd- [f?] dd-v^) !!^ 

= { decomposifion rule of merge', see above } 

{merge (x5dd- [a] dd-y.s'i) us-[^mergeys 2 {[b] dd-v^))!! ^ 

= { assumption k^p + q and decomposifion rule of (!!) } 

merge (x^dd- [a] dd-y.?!) usWk 
= { decomposifion rule of (!!) again } 

{merge (x^dd- [a] dd-y.?!) usmergeys 2 []) !!^ 

= { decomposifion rule of merge again } 

merge {xs dd- [a] dd-y.?) !! ^ 

The second case is when k>p + q. Lef qy be fhe lengfh of usi, where usi is as 
defined above. Then we have 

pd-ld-^i ^pd-ld-^^^ 
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This time we reason 

merge (x5+|- [a] [b] TTvi) \ \k 

= { ehoose us\ and us 2 as above } 

merge (x5+|- [a] +1-3'^) {us\ +I-M52+I- [b] Tfvi) !!^ 

= { deeomposition property of merge', see above } 

{merge (x5+l- [a]) us\ -Vr merge ys {us 2 Tf [b] TTv^)) !!^ 

= { assumption on ^ >p + ^ and deeomposition rule of (!!) } 

merge ys {us 2 Tf [b] TT vi) !! {k —p — l—qi) 

= { deeomposition properties of (!!) and merge again } 

mergeys (m5+|- [b] 4f v^)!! {k—p — 1) 

The proof is eomplete. 


6.4 Selection from the complement of a set 

Sometimes we want to select from the complement of a set. For example, given a 
set of five-letter words, we might want the (lexically) smallest five-letter word not 
in the set. Or we might want the smallest natural number not in a given finite set of 
natural numbers. The problem is a simplification of a common programming task 
in which the set represents objects currently in use and one wants to select some 
object not in use, say the one with the smallest name. In this section we tackle the 
natural number version of the problem, supposing the set is given as a list without 
duplicates in no particular order. For example, 

[08,23,09,00,12,11,01,10,13,07,41,04,14,21,05,17,03,19,02,06] 

How would you go about finding the smallest natural number not in this list? 

Here is the specification of the problem: 

select[Nat] Nat 
select xs = head ([0.. ] \\ 

(\\) '.'.Eqa^ [a] —)■ [a] —)■ [a] 
xs \\ ys = filter ys) xs 

The value xs\\ys (pronounced ‘xs minus ys’) is what remains when every element 
of is removed from xs. Evaluation of select on a list of length n takes Q.{n^) steps. 
For example, evaluation of select [0.. n — 1 ] requires n + \ membership tests, the 
total cost of which is n (n -|- l)/2 equality tests. 

One idea that quickly springs to mind for improving the running time of select is 
to sort the input. Since the order of the elements in the input is not material, we have 

select xs = head ([0.. ] \\ sort xs) 
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Now that the right-hand argument to \\ is ordered, we ean simply look for the first 
gap: 

select xs = searchFrom 0 {sort xs) 

searchFrom k[] = k 

searchFrom k {x : xs) = if k==x then searchFrom {k+\) xs else k 

This improves the running time of select to 0{n log n) steps, assuming an asymptot¬ 
ically optimal sorting algorithm is used. However, we can do better still and reduce 
the time to 0(w) steps. The key observation is that it is not necessary to sort all of 
the input: only those elements that are at most n, the size of the set, need be sorted. 
The reason is that not every number in {0, can be in the set: there are n -|- 1 

numbers in the former and only n in the latter. That means we can define 

select xs = searchFrom 0 {sort (filter n) x^)) where n = length xs 

Unlike general sorfing, we can sorf a lisf of n nafural numbers, all of which are in fhe 
range (0,n), in &{n) steps. For example, we can use Counfsorf (see Answer 5.17): 

select xs = searchFrom 0 {csort n (filter n) x^)) 
where n = length xs 

csort nxs = concat [replicate kx \ (x,k) ■(— assocs a] 

where a = accumArray (-|-)0(0,n) [(x,l) |xt— xi] 

In facl fhere is no need fo produce fhe sorted lisf: we can simply look for fhe firsf 
index wifh counf 0: 

select xs = length (takeWhile (fi 0) (elems a)) 

where a = accumArray (-I-) 0 (0,n) [(x, 1) | x ■(— xs,x ^ n] 
n = length xs 

This algorifhm does nol depend on fhe inpuf being a lisf wifhouf duplicafes. 

If is also possible fo devise a linear-lime divide-and-conquer solulion lhal does nol 
make use of arrays. The idea is fo decompose fhe lisls info Iwo equal-size sublisls 
and Ihen compute fhe solulion recursively by conlinuing wilh jusl one of Ihe sublisls, 
Ihe same slralegy lhal was used wilh binary search. Assuming Ihe decomposition 
lakes &{n) steps, we Ihen oblain Ihe recurrence relation 

T{n) = T{n/2) + &{n) 

for Ihe running time T{n), wilh solution T{n) = &{n). 

We can splil xs by partitioning xs on a suilably chosen nalural number b (Ihe 
choice will be made later on), using Ihe function partition of Quicksort. Wilh 
{ys,zs) = partition {<b) xs we have 

[0..]\\x5 = ([0..(j-l]\\y5)-lf ([(j..]\\z5) 

Here is Ihe proof: 



134 


Selection 


[0..] \\x5 

= { since [0. .] = [0. 1] 4f } 

([0. .Z?- 1] -if [b..]) \\xs 

= { since {as +|- bs) \\ xs = {as \\ xs) -H- {bs \\ xs) } 

{[0. .b - \]\\xs) ^ {[b. .]\\xs) 

= { since as \\ xs = {as \\ ys) \\ zs = {as \\ zs) \\ ys } 

{{[0. .b - \]\\zs)\\ys) {{[b. .]\\ys)\\zs)) 

= { since [0..Z; — 1 ] \\= [0..Z; — 1 ] and [Z;..] \\3^5 = [b..] } 

{[0..b-l\\\ys)^^{[b..\\\zs) 

Next, since 

head {as 4+ bs) = if null as then head bs else head as 
we obtain 

select xs = if null ([0. — 1] \\ys) 

then head ([Z;.. ] \\ 
else head {[0..b—l]'^ys) 
where {ys,zs) = partition {<b) xs 

Now comes a second key observation. Since ys does not contain duplicates and 
every element of ys is less than b, we have 

null ([0. .Z; — 1] Wy^) = {lengthys == b) 

Inspection of the code for select suggests that we should generalise select to a 
function, selectFrom say, defined by 

selectFromwNat \Nat] —)■ Nat 

selectFrom a xs = head ([a.. ] \\ xs) 

under the invariant that no element of xs is smaller than a. Then, provided b is 
chosen so that the lengths of both partitioned lists are at most half the length of the 
original, the following recursive definition of select is well-founded: 

select xs = selectFrom 0 xs 
selectFrom a xs \ null xs = a 

I length == b — a = selectFrom b zs 
I otherwise = selectFrom a ys 

where {ys,zs) = partition {<b) xs 

It remains to choose b. Clearly we want b>a, but we would also like to ensure the 
lengths p and q of the two lists y^ and zs are as equal as possible. The appropriate 
choice to satisfy these requirements is b = a + \ + \n/2 \, where n is the length 
of xs. If n / 0 and p <b — a, then p ^b — a—1 = [n/2 \, while if p = b — a, then 
q = n — {b — a) = n — \n/2\ — I ^ [n/2\. As a final optimisation we can avoid 
repeatedly computing length by tupling each list with its length. All of that leads to 
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select xs = selectFrom 0 {length xs,xs) 
selectFrom a {n,xs) \ n == 0 = a 

I I == b — a = selectFrom b {n — l,zs) 
I otherwise = selectFrom a {l,ys) 
where (ys,zs) = partition {<b) xs 
b = a +I+ n div 2 
I = length ys 

This solution also takes linear time. 


6.5 Chapter notes 

The lower bound on the number of comparisons required to compute minmax 
(see Answer 6.7) is from Pohl [5]. The linear-time selection algorithm was first 
described by Blum et al. [2]. The selection algorithm derived from Quicksort is due 
to Hoare [3]. It is still not known exactly how many comparisons are required to 
find the median; see Paterson [4]. The problems of selecting from the union of two 
sets and from the complement of a set were treated in [1]. 
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Exercises 

Exercise 6.1 Does it make sense to define the kth smallest element of an arbitrary 
list as an element with exactly k—\ elements smaller than it? 

Exercise 6.2 Define combinators cross and pair so that 

pair (foldrfi ei,foldrf 2 02 ) =foldr {cross-pair if\,f2)) {^iFl) 

Exercise 6.3 Describe two lists of length n for which the second definition of 
minmax uses n — \ and 2n — 2 comparisons, respectively. 
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Exercise 6.4 Show that in the divide-and-conquer algorithm for minmax the com¬ 
parison count C{n) satisfies C{n) = 3 n/2— 2 when n is a positive power of two. 

Exercise 6.5 Show that D{n) = 2 (n — 1) in the hottom-up algorithm for minmax. 

Exercise 6.6 Suppose a set is given by a balanced binary search tree. Asymptoti¬ 
cally, how long does it take to find the minimum and maximum elements? 

Exercise 6.7 Consider a particularly brutal tennis tournament with the twist that 
the tournament has to determine both the best and the worst player. Initially all n 
players are potential champions or potential losers, and the overlap between the 
two groups is n. The best we can do is to play a match between two players in the 
overlap, placing the winner in a potential champion category, and the loser in a 
potential loser category. Assuming n is even, it follows that after njl matches the 
overlap is reduced to zero. How would you complete the tournament and how many 
matches are there? 

Exercise 6.8 Show that n + [log n] — 2 matches are sufficient to determine the best 
and second-best players in a tennis tournament involving n players. 

Exercise 6.9 Are there any other ways apart from setting pivot [x] = x to ensure 
the definition of the linear-time algorithm for select is well-founded? 

Exercise 6.10 Use the rules of floors and ceilings fo show thaf 
3 L([n/51 +1)/2J -1 ^3n/10 
No case analysis is allowed. 

Exercise 6.11 Instead of grouping elements into blocks of 5, suppose we group 
into blocks of 3. Does this lead to a linear-time algorithm for select? How about 
grouping into blocks of 7? 

Exercise 6.12 Counting only comparisons between list elements, how many com¬ 
parisons are required to compute select 4 [ 1.. 7 ] when pivot chooses the first element 
of a list, and when pivot is as defined in the linear-time version of select? For the 
second question you can assume that sorting n elements for 3 ^ n ^ 5 requires 3 
comparisons for n = 3, 5 comparisons for n = 4, and 7 comparisons for n = 5. 

Exercise 6.13 Is <C= a transitive relation, that is, does xs<^=ys and ys<^zs imply 
xs<^zs? 

Exercise 6.14 Suppose xs -H- ys and us -H- vs are both sorted. Prove that xs <C= V5 
or us <C= ys. 

Exercise 6.15 Assuming arrays are indexed from 0, write down a definition of 
select:: Ord a ^ Nat —)■ Array Nat a —)■ Array Nat a^ a 
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Answers 

Answer 6.1 No. Consider the third smallest element of the list [1,2,2,3]. There is 
no element with exactly two smaller elements. 

Answer 6.2 We can define 

pair:: (a ^ b,a ^ c) —)■ a —)■ {b,c) 
pair (f,g)x=(fx,gx) 

cross:: {a ^ c,b ^ d) ^ {a-,b) {c,d) 

cross (f,g) {x,y) = (f x,gy) 

Then we have 

{cross-pair (f,g)) x {y,z) = cross {pair (f,g) x) {y,z) 

= cross {fx,gx) {y,z) 

= {fxy,gxz) 

Answer 6.3 One worst case is when the input is in ascending order. One best case 
is when the input is in ascending order but with the first and last elements swapped. 

Answer 6.4 When n is a power of two we have C{n) = 2C{n/2) + 2 and a simple 
induction yields the answer. 

Answer 6.5 The induction step is D{n) = 2 [n/2j + 2 ([n/2] — 1) = 2 (n — 1). 

Answer 6.6 We have to find fhe leffmosf and righfmosf elemenfs in fhe free, each 
of which fakes 0(log n) sfeps. 

Answer 6.7 After reducing the overlap to zero, the best one can do thereafter is 
to play n/2 — 1 matches among the potential losers to determine the worst player, 
and n/2 — 1 matches among the potential winners to determine the champion. That 
comes to a total of 'in/2 —2 matches. A similar argument holds for odd n and 
shows that [3n/2] —2 matches are sufficient, which is the same bound as for the 
bottom-up algorithm for minmax. 

The tennis tournament analogy can be used to show that [3n/2] —2 matches are 
necessary in the worst case to determine both the champion and the worst player. 
We cannot, of course, construct a worst case, because that would depend on the 
particular algorithm being executed. Instead we can use an adversarial argument. 
In this scenario, the adversary chooses the answers to each comparison test asked 
for by a particular algorithm as it runs in order to force a worst case. The only 
restriction on the adversary is that the answers must be consistent with all previous 
answers. Now, at any stage of the tournament there are four possible groups of 
players: those that haven’t played any matches so far (group A), those that have 
played some matches and never lost (group B); those that have played some matches 
and never won (group C); and those that have both won and lost a match (group 
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D). Let the quadruple {a,b,c,d) denote the number of players in each of these 
categories at some stage of the tournament. Any algorithm starts with (n, 0,0,0) 
and ends with (0,1, l,?i — 2). The adversary can always arrange that the answer to 
each comparison test either leaves {a,b,c,d) unchanged or else produces one of the 
following quadruples (as long as all values are nonnegative): 

{a — 2,b + l,c+\,d) {a — \,b,c + \,d) {a — l,b+\,c,d) 
{a,b—\,c,d+\) {a,b,c — l,d+l) 

In the first case, a match between two group A players - an AA match - will always 
produce one extra member of group B and one extra member of group C. In the 
second case, the adversary can arrange that an AB match produces only one extra 
member of group C (by having the player in group B win). And so on, for each of 
the ten cases AA, AB, AC, AD, BB, BC, BD, CC, CD, and DD. The final step is fo 
consider the value k = 3a + 2b + 2c. At the start of the tournament k = 3n and at 
the conclusion k = 4. But the value of k can decrease by at most two at each step, 
so it takes at least [(3n —4)/2] = [3n/2] — 2 matches to determine the outcome. 

Answer 6.8 It requires n—1 matches to determine the best player. Any player who 
lost to the eventual winner of the tournament may be the second-best player. Since 
there are [log n] players who lost to the eventual winner, a second tournament of 
[log n\ — \ matches can be played to determine the second-best player. That gives a 
total ofn+ [log n\—2 matches. Using another adversarial argument, it can also be 
shown that this number of matches is necessary. 

Answer 6.9 Yes, either set select \[x]=x or set median [x] = x. 

Answer 6.10 The proof goes as follows: 

3k-l ^3[([n/5l +1)/2J -1 
{ arithmetic of integers } 
k^[([n/5l+l)/2j 
{ rule of floors } 

2k-I ^ \n/5] 

4^ { arithmetic of integers } 

2k-2<\n/5] 

44^ { (contrapositive) rule of ceilings } 

10 k— 10<n 
{ arithmetic } 

k ^ (n-|-9)/10 

Hence 

3n/10 ^ 3(n + 9)/10- 1 ^ 3 [([n/5] + 1)/2J - 1 
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Answer 6.11 No, dividing into blocks of 3 will not give a linear-time algorithm. 
We have 

2[(rn/3l + l)/2j-l^n/3 
so the associated recurrence relation is 
T{n) = T{n/3) + T{2n/3) + &{n) 

whose solution is &{n log n). However, dividing into blocks of 7 is okay because 
the associated recurrence relation is 

T{n) = T{n/l) + T{5n/l) + &{n) 

whose solution is T{n) = 0(n). 

Answer 6.12 When the pivot is chosen to be the first element of the list, the value 
of select 4 [ 1 • • 7] is obtained by computing partitions p [p.. 7] for p = 1,2,3,4. 
Partitioning a list of n elements requires n comparisons, so7-|-6-|-5-|-4 = 22 
comparisons are required. 

To answer the second question, the calling structure with associated comparison 
counts is given by 

select 4 [ 1.. 7 ] 
pivot [ 1. • 7 ] 

sort [ 1.. 5 ] (7 comparisons) 

sort[6,l] (1 comparisons) 

select! [3,6] (4 comparisons) 

partitions 3 [ 1.. 7 ] (7 comparisons) 

select 1 [4.. 7 ] (11 comparisons) 

The counts, 4 and 11, for the recursive calls select 1 [3,6] and select 1 [4. .7] of 
select are given by 

select 1 [3,6] 
pivot [3,6] 

sort [3,6] (1 comparisons) 

select 1 [3] (1 comparisons) 

partitions 3 [3,6] (2 comparisons) 

for a total count of 4 and 

select 1 [4.. 7] 
pivot [4. .7] 

sort [4.. 7 ] (5 comparisons) 

select 1 [5] (1 comparisons) 

partitions 5 [4..7] (4 comparisons) 

select 1 [4] (1 comparisons) 
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for a total count of 11. Calls of the form select 1 [x] each take one comparison. That 
gives a grand total of 30 comparisons. 

Answer 6.13 No, not if ys is empty. Transitivity holds only for nonempty lists. 

Answer 6.14 The result is immediate if any of the four lists is empty. Otherwise 
either last xs ^ head vs or last us ^ head ys because if both are false, then 

head V 5 < last xs ^ head ys < last us 
contradicting the assumption that us Tf V 5 is sorted. 

Answer 6.15 To help understand the program below, suppose xa[0. .n — \] denotes 
an array of n elements. Then bounds xa = (0,n — 1). The segment xa [lx. .rx] of xa 
has length rx — Zx + 1, and so is empty if Zx = rx + 1. The midpoint of the segment 
is xa ! p, where p = {lx + rx) div 2. The element at position k, counting from 0, is at 
position Zx + rx in the segment, and at position k + Zx + Zy in the result of merging 
xs [lx., rx] with ya [ly ..ry]. With that understood, the function 

select:: Ord a ^ Nat —)■ Array Nat a —)■ Array Nat a^ a 
is defined by 

select kxaya = search k {bounds xa) {bounds ya) where 
search k {lx, rx) {ly, ry) 

|Zx==rx+l =ya\{ly + k) 

[ly==ry+l =xa\{lx + k) 

[a^bAk + lx + ly^p + q = search k {lx, rx) {ly,q—l) 

[a^bAk + lx + ly>p + q = search {k + lx—p — l) {p + l,rx) {ly,ry) 
[b^aAk + lx + ly^p + q = search k {lx,p — l) {ly,ry) 

[b^aAk + lx + ly>p + q = search {k + ly — q—l) {lx, rx) {q+l,ry) 
where p= {lx + rx) div 2 
^ = (Zy + ry) div 2 
a =xa\p 
b =ya\q 
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Many computational problems involve selecting some best candidate from a set of 
possible candidates. Candidates can be lists, trees, layouts of a document, routes 
in a network of roads, and so on. Tbe best candidate may be tbe shortest list, the 
least wasteful paragraph, or the quickest route. Even sorting can be regarded as 
an optimisation problem; after all, the aim of a sorting algorithm is to find some 
permutation of the input that minimises the number of out-of-order elements. Greedy 
sorting algorithms will be our first topic in Chapter 7. In general there may be more 
than one best candidate and the task is to find just one of them. 

The input to such problems is normally not the set of candidates but a list of 
components out of which the candidates can be built. For example, the raw materials 
may be a list of words that constitute the paragraph, a list of numbers that form 
the fringe of a tree, or a list of towns and roads in a shortest-path algorithm. A 
greedy algorithm solves such a problem in a step-by-step fashion by constructing a 
single best partial candidate at each step. A partial candidate may be a fully formed 
candidate for the components used so far in its construction, but it may be something 
more general. The idea of a step-by-step algorithm is easy to grasp intuitively but 
not so easy to formalise, especially in a purely functional language. For example, a 
divide-and-conquer algorithm for sorting will minimise the number of out-of-order 
elements but it is not, conceptually at least, a step-by-step algorithm. Finally, in most 
of our examples a best candidate is one that minimises some notion of cost, so the 
word ‘greedy’ may seem a little inappropriate. Perhaps ‘frugal’ or ‘parsimonious’ 
would be better adjectives. However, the name ‘greedy’ has become the standard 
way of referring to these algorithms. 

The idea of maintaining a single best candidate at each step will not always lead 
to a best final candidate. Suppose you are out walking on a hillside and wish to 
climb to the highest point. If you are in a mist and cannot see where to go, you may 
decide on the strategy of choosing to walk along a path of steepest ascent at each 
step. That may work, but it may also lead to the top of a little hillock when there is 
a much bigger hill in the background. The same is true of greedy algorithms (which 
have also been called hillclimbing algorithms): you may get to a locally optimal 
solution that is not globally optimal. We will say that a greedy algorithm works if it 
does lead to a globally optimal solution. 

Greedy algorithms can be tricky things. The trickiness is not in the algorithm 
itself, which is usually quite short and easy to understand, but in the proof that 
it does produce a best solution. With a greedy algorithm the correctness of the 
program is less obvious than with, say, a sorting algorithm; after all we did not 
spend any time in previous chapters on proving that the various sorting algorithms 
actually did sort. The main difficulty with proving that a greedy algorithm works is 
that for many problems equational reasoning is simply not up to the task and has 
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to be replaced by reasoning about refinement. We will see what this entails in due 
course. 

In the following chapters we consider a number of greedy algorithms. Rather 
than just giving the algorithm and proving it works, we will take a more struc¬ 
tured approach, one that will pay dividends when we come to discuss dynamic 
programming and exhaustive search algorithms. First we show how to define a 
function candidates that generates the set of all possible candidates. This function 
may be defined recursively or by using a suitable higher-order function such as 
foldr, until, or apply (a function that applies another function a given number of 
times). Sometimes more than one style is available, each leading to an equally 
clear definition, so it is a free choice as to which one to employ. Next we define 
the selection criterion explicitly. This can also sometimes be done in various ways 
because a given greedy algorithm may work for more than one cost function. The 
choice of how the candidates are generated and how the cost function is defined can 
have a significant effect on the ease with which the correctness of the algorithm is 
proved. Finally, the generation and selection functions are combined, or fused, into 
one function. When candidates is defined as an instance of a standard higher-order 
function such as foldr we can appeal to standard fusion conditions to carry out 
the fusion step. We have already seen this two-stage process with sorting, when 
algorithms were described in terms of building a tree and then flattening it. The 
deep structure of an algorithm is revealed by expressing it in terms of more basic 
components and then combining them. 



Chapter 7 


Greedy algorithms on lists 


This chapter deals with three prohlems in which the candidates are lists. The 
problems are drawn from three different areas of computing, and appear to have 
nothing in common. Nevertheless, they can all he solved hy an appropriate greedy 
algorithm and the method for obtaining the greedy algorithm is the same in all three 
cases. The problems will repay careful study, so we will try to take things slowly. 
To set the scene, we begin with an abstract formulation of the essential ingredients 
behind a successful greedy algorithm. 


7.1 A generic greedy algorithm 

The following function mcc selects a candidate with minimum cost: 

mcc :: [Component] —?■ Candidate 
mcc = minWith cost ■ candidates 

This function is defined as the composition of a function candidates that builds a 
finite list of candidates out of a list of components, and a function minWith cost that 
selects a candidate with minimum cost. The function minWith can be defined in the 
following way (alternatives are discussed in the exercises): 

minWith y. Ord b ^ {a ^ b) ^ \a\^ a 
minWithf =foldrl [smallerf) 

where smaller/xy = iff x y then x else y 

The function foldrl was introduced in the previous chapter. Since foldrl returns 
the undefined value when applied to an empty list, so does minWith. Thus minWith 
returns a well-defined value only when applied to finite nonempty lists. If there is 
more than one candidate with minimum cost, then the above definition selects the 
first such candidate on the list. Changing smaller to read 

smaller/xy = if /x </y then x else y 
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would mean that the last candidate with minimum cost is selected. Consequently, 
the result returned hy minWith depends on the precise order in which candidates are 
generated. As will he appreciated hy the end of the chapter, this fact will seriously 
restrict our ability to reason equationally about greedy algorithms. The function 
candidates takes a finite list of components, whatever they may be, and returns 
a finite nonempty list of candidates. Candidate construction can be achieved in a 
number of ways, but for the moment we will focus on one that usesfoldr: 

candidates v. [Component] —)■ [Candidate] 
candidates xs =foldr step [cq] xs 

where step xcs = concatMap {extend x) cs 

Here cq is some default partial candidate for an empty list of components. We could 
have written 

candidates =foldr {concatMap ■ extend) [co] 
but step is certainly a shorter name. The type of extend is 
extend:: Component —)■ Candidate [Candidate] 

This function takes a component and a candidate and returns a finite list of extended 
candidates. The fully formed candidates are those constructed when all the compo¬ 
nents have been processed. If the candidates were, say, the permutations of a list, 
then Co would be the empty list and extend x would be a list of all the ways x can be 
inserted into a given permutation. For example, 

cxicndl [2,4,3] = [[1,2,4,3],[2,1,4,3],[2,4,1,3],[2,4,3,1]] 

It is assumed in what follows that extend x returns a nonempty finite list of candidates 
for all X. 

A greedy algorithm for computing mcc arises as the result of successfully fus¬ 
ing minWith cost with candidates. Operationally speaking, instead of building the 
complete list of candidates and then selecting a best one, we construct a single best 
candidate at each step. We met the fusion rule for foldr in Chapter 1, but here it is 
again: we have 

h (foldr f e xs) = foldr g e' xs 

for all finite lists xs, provided e' = he and the fusion condition 
h(f xy) =gx{hy) 

holds for all x and y. For our problem, h = minWith cost and / = step, but g is 
unknown. The fusion condition reads 

minWith cost {step x c^) = gstep x {minWith cost cs) 

for some function gstep (a ‘greedy step’). To see if it holds, and to discover gstep in 
the process, we can reason as follows: 
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minWith cost {step x cs) 

= { definition of step } 

minWith cost {concatMap {extend x) c^) 

= { distributive law (see below) } 

minWith cost {map {minWith cost ■ extend x) ci) 

= { define gstep x = minWith cost ■ extend x } 

minWith cost {map {gstep x) cs) 

= { greedy condition (see below) } 

gstep X {minWith cost cs) 

The distributive law used in the second step is the fact that 

minWithf {concat xss) = minWithf {map {minWithf) xss) 
provided xss is a finite list of finite nonempty lists. Equivalently, 

minWithf {concatMap g xs) = minWithf {map {minWithf ■ g) xs) 

provided xs is a finite list and g returns finite nonempty lists. The proof of the 
distributivity law is left as Exercise 7.3. 

Summarising this short calculation, we have shown that 

mcc =foldr gstep cq where gstep x = minWith cost ■ extend x 
provided the following greedy condition holds: 

minWith cost {map {gstep x) cs) = gstep x {minWith cost cs) 

That all seems simple enough, so let’s look at some concrete examples. 


7.2 Greedy sorting algorithms 

Here is one specification of the function sort that sorts a list into ascending order: 

sort:: Ord a ^ [a] ^ [a] 
sort = minWith ic ■ perms 

The function ic:: Ord a ^ [a] —)■ Int, short for ‘inversion count’, counts the number 
of inversions in a list. The notion of an inversion is one of the first concepts that 
arise in the study of the combinatorial properties of permutations. An inversion is a 
pair of elements that are out of place, so (x,y) is an inversion if x appears before 
y in the list but x>y. Eor example, ic [7,1,2,3] =3 and /c [3,2,1,7] = 3. We can 
define ic by 

ic:: Ord a ^ [a] —)■ Int 

ic xs = length [(x,y) j (x,y) ■(— pairs xs,x>y] 

pairs:: [a] —)■ [(a,a)] 

pairs xs = [(x,y) \x:ys ^ tails xs,y: zs ■(— tails y^] 
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The function pairs returns a list of all pairs of elements in a list in the order they 
appear in the list, and ic counts the number of pairs for which the first component 
is greater than the second. A list with minimum inversion count has count 0 and 
is a list in ascending order. Two distinct permutations cannot both be in ascending 
order, so there is only one permutation of a list that minimises ic, namely the sorted 
permutation. 

The function perms can be defined in various ways, including by a divide-and- 
conquer algorithm - see the exercises. Here is the first method used in Chapter 1: 

perms w \a] —)• [[a]] 

perms =foldr {concatMap ■ extend) [ [ ] ] 

extend:: a —)• [a] —)■ [[a]] 
extend x[] = [ ] 

extendx (y: xs) = {x:y:xs): map (y:) {extendxxs) 

The function extend inserts a new element into a list in all possible positions. In 
particular, the function gstep, where 

gstep X = minWith ic ■ extend x 

inserts a new element into a list so as to minimise the inversion count of the result. 
For example, 

gstep 6 [1,1,2,3] = [7,1,2,3,6] 
gstep 6 [3,2,1,1] = [3,2,1,6,7] 

The first list has inversion count 4, while the second has inversion count 3. The 
greedy condition for sort is the assertion 

minWith ic {map {gstep x) xss) = gstep x {minWith ic xss) 

for all X and xss. However, this assertion is false. Take xss = [[7,1,2,3], [3,2,1,7]]. 
We have 

minWith ic xss = [7,1,2,3 ] 

because both lists have inversion count 3 and minWith returns the first list with the 
smallest inversion count. Hence 

gstep 6 {minWith ic xss) = [7,1,2,3,6] 
with inversion count 4, while 

minWith ic {map {gstep 6) xss) = [3,2,1,6,7] 

with inversion count 3. The greedy condition therefore fails. Of course, the greedy 
condition does hold if we swap the order of the two lists in xss, but what if xss were 
a longer list of permutations? It is not clear that we can always reorder a list of 
candidates to ensure that the greedy condition holds. In any case, such a step is 
the wrong route to take, because candidate generation should be an independent 
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activity from finding one with minimum cost. We therefore appear to he well and 
truly stuck. 

There are three possible solvents to free us from this sticky situation. Two of 
them will he descrihed now, hut the third and most important one will he left to the 
end of the chapter. 

The first way of freeing ourselves is to use context-sensitive fusion. We discussed 
this in Exercise 1.17, hut here is the essential point again. Although the fusion 
condition 

h (f xy) =gx{hy) 

for all X and y is sufficient to establish the fusion rule for foldr, it is not a necessary 
one. All that does have to be shown is that the fusion condition holds for all x and 
all y of the form y = foldr f e xs. This version of the fusion condition is called 
context-sensitive fusion. That means, in the case of sorting, that all we have in fact 
to show is the context-sensitive fusion condition 

minWithic {map (gstepx) {perms xs)) = gstep x {minWithic {perms xs)) 

for all X and xs. Luckily, this condition does hold. The proof follows from the fact 
that there is a unique permutation that minimises ic, namely the ordered permutation 
with inversion count 0 , and that 

ic X5 = 0 ^ ic {gstep x xs) = 0 

Sometimes context-sensitive fusion is not enough to establish the greedy condition. 
This happens when there is no unique candidate with minimum cost. The second 
way of becoming unstuck is to change the cost function. Suppose you are climbing 
some hills and want to reach a highest point, of which there maybe more than one. 
At each step you can take the steepest ascending path, a strategy that may or may not 
work. Nevertheless, it may also be the case that there is a unique point in the climb 
with the best view, and that point is also a highest one. An alternative strategy at 
each step is to take the path that best improves the view. The proof that this strategy 
works may go through when the first one does not. We will see many examples of 
this trick in the following chapters. 

In the case of sorting there is a simple alternative to ic, in fact an alternative that 
can be accomplished at a stroke! It is to replace ic with id, the identity function. We 
did not claim that the cost function had to return a single numerical value. We have 
minWith id = minimum, so sort = minimum ■ perms. In words, the sorted permuta¬ 
tion is the lexically least permutation. The context-sensitive greedy condition in this 
case reads 

minimum {map {gstep x) {perms xs)) = gstep x {minimum {perms xs)) 

where gstep xxs = minimum {extendx xs). As before, the greedy condition holds 
because the sorted permutation is the unique permutation that minimises id, and 
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sorted xs ^ sorted {gstep x xs) 

When xs is a sorted list we ean define gstep x xs by 
gstep X [ ] =[x] 

gstep X (y: xs) = if x ^ y then x'.y.xs else y: gstep x xs 

The result, namely sort =foldr gstep [ ], is a simple sorting algorithm usually known 
as Insertion sort. 

Insertion sort is not the only greedy sorting algorithm. Here is the other definition 
of perms from Chapter 1: 

perms [ ] = [ [ ] ] 

perms xs = concatMap subperms (picks xs) 

where subperms (x,ys) = map (x:) (perms ys) 
picks [ ] = [ ] 

picks (x:xs) = (xys) : [(y,x:y5) | (y.,ys) •(— picks xs] 

The function picks picks an arbitrary element from a list in all possible ways, 
returning both the element and what remains. This version of perms is defined 
recursively rather than through the use offoldr. Nevertheless, it is straightforward 
to fuse minimum and perms. Firstly, we have 

minimum (perms [ ]) = minimum [ [ ] ] = [ ] 

Secondly, for nonempty xs we reason as follows: 

minimum (perms xs) 

= { above definition of perms } 

minimum (concatMap subperms (picks xs)) 

= { distributive law } 

minimum (map (minimum ■ subperms) (picks xs)) 

= { claim: see below } 

minimum (subperms (minimum (picksxs))) 

= { suppose (x,y5) = minimum (picks xs) } 

minimum (subperms (x,y5)) 

= { definition of subperms } 

minimum (map (x:) (permsys)) 

= { since minimum ■ map (x:) = (x:) • minimum on nonemtpy lists } 

X: minimum (perms ys) 

The claim takes the form 

minimum ■ mapf =f ■ minimum 

where / = minimum ■ subperms. It is left as an exercise to show that the claim holds 
if / is a monotonic function, that is, x ^ y ^/x ^/y. To verify the claim for 
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minimum-subperms, suppose (xi,3^5i) and {x 2 ,ys 2 ) are two picks of the same list. 
It is easy to check that 

(xi,y5i) ^ (x 2 ,ys 2 ) xi :sortysi ^X 2 '-sortys 2 
But, as we have seen, minimum {subperms (x,y5)) = x: sort ys so the claim follows. 
We have therefore shown 

sort [] = [] 

sort xs = x: sort ys where {x,ys) = pick xs 

where pick xs = minimum (picks xs). The function pick takes quadratic time, hut it 
can he implemented to take linear time, see Exercise 7.10. The result is another 
well-known algorithm for sorting called Selection sort. Both Insertion sort and 
Selection sort take quadratic time in the worst case, so they are not fast. But they 
are simple. 


7.3 Coin-changing 

Our second problem is about giving change in coins. Suppose you were a cashier in 
a supermarket and had to give 2.56 in change to a customer. How would you do it? 
Pause for a moment to answer this question. 


We cannot answer this question for you because we do not know your nationality 
and the currency of your country (though we have assumed in the statement of the 
question that it is a decimal currency). The denominations of the available coins 
have to be known. In the United States the denominations are a penny (Ic), a nickel 
(5c), a dime (10c), a quarter (25c), a half-dollar (50c), and a dollar ($1). In the UK 
they are Ip, 2p, 5p, lOp, 20p, 50p, f 1, and £2. (Even 50 years after decimalisation, 
there are no nicknames for UK coins.) Pre-decimalisation, the UK coinage system 
was a very odd one, consisting of a halfpenny (0.5d), a penny (Id), a threepence 
(3d), a sixpence (6d), a shilling (12d), a florin (24d), and a half-crown (30d). The 
British coped with this system somehow, but fortunately it is now consigned to 
history. Note that, whatever the system, there has to be a coin that allows change of 
one unit of currency, be it a penny, a halfpenny, or a cent. ^ 

We also cannot answer the question until you say what criterion you are adopting 
for giving the change. Are you trying to minimise the number of coins in the change 
or maybe the total weight? Although some people delight in carrying around lots 
of loose change, the coins can weigh a lot in the pocket or handbag, so maybe 
minimum weight should be the criterion to go for. The weights in grams of the UK 
and US coins are given in the following table: 

* That is not strictly true. For example the smallest coin in Australia is 5c hut one can buy items for, say, $9.99. 
When paying by cash with a $10 note, the cashier rounds the amount down to $9.95 and gives 5c change. 
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1 

2 

5 

10 

20 

25 

50 

100 

200 

UK 

3.56 

7.12 

3.25 

6.5 

5.0 

- 

8.0 

9.5 

12.0 

US 

2.5 

- 

5.0 

2.27 

- 

5.67 

11.54 

8.1 

- 


Inspection shows that for UK currency each coin of a given denomination weighs 
no more than the value of the coin in smaller denominations. But that is not quite 
enough to prove that minimising the number of coins also minimises the total weight. 
In US currency two quarters weigh 11.34 grams, which is less than the weight of 
a half-dollar (11.54 grams). And a quarter and a nickel together weigh 10.67 grams, 
while three dimes weigh only 6.81 grams. Certainly in the United States minimising 
the number of coins does not minimise their total weight. The statement is true for 
UK currency, as one can check by exhaustive search (see the exercises). 

There is an obvious greedy algorithm for minimising the number of coins: at 
each step give the customer a coin of largest value as long as it is no larger than 
the remaining amount. Cashiers regularly adopt such a strategy the world over. For 
$2.56 that would mean five coins: 

2 X $1 +1 X 50c +1 X 5c + 1 X Ic 
For £2.56 that would mean four coins: 

1 X £2 + 1 X 50p + 1 X 5p + 1 X Ip 

Does the greedy algorithm work? The answer is no, not necessarily: it depends 
on the coinage system. Before 1971 when decimalisation occurred in the UK, a 
greedy algorithm for giving change of 48d would use three coins, a half-crown, a 
shilling and a sixpence, whereas two florins would suffice. Another reason to be 
grateful for decimalisation. As a simpler example, with denominations [4,3,1] the 
greedy algorithm would give three coins for a change of 6 units, while two coins of 
denomination 3 would suffice. 

To specify the problem, suppose we are given a list of denominations in decreasing 
order, ending with a denomination of 1. For example, 

type Denom = Nat 
type Tuple = [Nat] 

usds, ukds:: [Denom] 
usds = [100,50,25,10,5,1] 
ukds = [200,100,50,20,10,5,2,1] 

By definition, a tuple is a list of natural numbers, of the same length as the list of 
denominations, representing the given change (we prefer ‘tuple’ to ‘change’ simply 
because ‘tuples’ reads better than ‘changes’). For example, [2,1,0,0,1,1 ] represents 
$2.56 in US currency. The amount a tuple represents is given by 

amount:: [Denom] —)■ Tuple Nat 
amount ds cs = sum {zipWith (x) ds cs) 
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The number of coins in a tuple, its count, is defined by count = sum. We can now 
define 

mkchange :: [Denom] —)■ Nat —s- Tuple 
mkchange ds = minWith count ■ mktuples ds 
The function mktuples gives all possible ways of making change for a given amount 
with the given denominations. One simple definition is 

mktuples\\ [Denom] —)■ Nat —)■ [Tuple] 
mktuples [1] ?^ = [[?^]] 

mktuples [d : ds) n = [c: cs \ c ^ [0. .n div d],cs ^ mktuples ds {n — c xd)] 
By assumption the last coin has denomination 1, so to make up change of n using 
just this coin we have to use n coins. Otherwise, for the next denomination d any 
number c in the range 0 ^ c ^ \n/d] can be chosen. The rest of the computation is 
a recursive call with the remaining denominations and the remaining amount n — cd. 
Another reasonable definition of mktuples based on foldr is given as Exercise 7.14. 

The function mktuples can return a long list of candidate tuples. For example, 

length [mktuples usds 256) = 6620 
length [mktuples ukds 256) = 223195 
Hence computation with the above definition of mkchange is quite slow. 

There is another important feature of mkchange to take into account: unlike the 
case of sorting there may be more than one tuple with minimum count. For example, 
take the denominations [7,3,1]. Then both [6,4,0] and [7,1,2] are tuples for 54 
units of change with minimum count 10. The definition of minWith given in the 
previous section chooses the first tuple with minimum count in the list of candidates. 
That means the above definition of mkchange returns [6,4,0] (why?) but the greedy 
algorithm, as outlined in the preamble, chooses the second tuple [7,1,2]. These 
results are different and again we seem to be stuck. One can resolve this difficulty 
by modifying the definition of mktuples to produce the tuples in a different order. 
But two wrongs do not make a right and this is not the path to take. One alternative, 
as we have seen, is to change the cost function. 

This time we replace minWith count by maxWith id. Thus we define 

mkchange ds = maximum ■ mktuples ds 

since maxWith id = maximum. Instead of choosing a tuple with minimum count, 
mkchange chooses the lexically largest tuple. Whether or not the largest tuple is 
also one with minimum count depends on the denominations of the coins. We return 
to this essential point in a short while. Note that, while there may be more than one 
tuple with minimum count, there is always a unique largest tuple. 

So, let us calculate it. The base case 

mkchange [\]n= [n] 
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is immediate. For the induction step we first rewrite the definition of mktuples to 
avoid an explicit list comprehension: 

mktuples [1] = [[?t]] 

mktuples {d : ds) n = concatMap {extend ds) [Q..n div d\ 
where extend ds c = map (c:) {mktuples ds {n — c x d)) 

The translation is straightforward and details are omitted. The advantage of higher- 
order functions such as concatMap over list comprehensions is that the rules of the 
game can he stated more simply. In particular, 

maximum {concatMap f xs) = maximum {map {maximum-f) xs) 
maximum {map {x\) xs) = x : maximum xs 

for all finite nonempty lists. The first law is an instance of the distrihutive law of the 
previous section; as before it is valid only iff returns a finite nonempty list. That is 
not a problem here because extend does return such a list. The second is not valid 
if xs is the empty list (why?), but that is also not a problem here because mktuples 
returns a nonempty list. 

We now reason as follows: 

mkchange {d: ds) n 
= { definition } 

maximum {mktuples {d : ds) n) 

= { definition of mktuples with m = n div d } 

maximum {concatMap {extend ds) [0. .m]) 

= { first law above } 

maximum {map {maximum ■ extend ds) [0.. m]) 

We continue with the inner term: 

maximum {extend ds c) 

= { definition of extend } 

maximum {map (c:) {mktuples ds {n — c x d))) 

= { second law above } 

c: maximum {mktuples ds {n — cxd)) 

= { definition of mkchange } 

c: mkchange ds {n — c x d) 

Hence 

maximum {map {maximum ■ extend ds) [0.. m]) 

= { above } 

maximum [c: mkchange ds {n — c x d) | c t— [0.. m] ] 

= { definition of lexicographic maximum } 

m: mkchange ds {n — mxd) 
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That gives the greedy algorithm: 

mkchange :: [Denom] —)■ Nat —)■ Tuple 
mkchange[l]n =[n] 

mkchange {d: ds) n = c: mkchange ds {n — c xd) where c = n div d 

At each step the maximum number of coins of the next denomination is chosen. For 
example, 

mkchange ukds 256 =[1,0,1,0,0,1,0,1] 
mkchange usds 256 = [2,1,0,0,1,1] 
mkchange [7,3,1] 54 = [7,1,2] 

All of the calculation above is valid for the earlier definition of mkchange in terms 
of minimising count, except for the very last step. 

Finally, but crucially, we revisit the question of when mkchange does produce a 
tuple with minimum count. Equivalently, when is the lexically largest tuple also the 
one with minimum count? 

Let us prove this is the case for UK currency (the case of US currency is slightly 
simpler and left as an exercise). Let [c 8 ,C 7 , ...,ci] be a tuple with minimum count 
and [g 8 , §7 , • • •, g 1 ] be the tuple returned by the greedy algorithm, namely the lexically 
largest tuple. The aim is to show that cj = gj for 1 ^ 8 , so the largest tuple for 

UK currency is the unique tuple with minimum count. This is not necessarily true 
for other currencies for which the greedy algorithm works. The amount A in the 
change satisfies 

A = 200C 8 T lOOcy -f 50 c 6 -f 20C 5 -f 10c4 “l“5c3 “l“2c2 Aci 
A = 200g 8 +100 g7 + 50g6 + 20g5 -f 10g4 + 5 g3 + 2g2 + gi 

We firsf show fhaf c\ = gi and C 2 = g 2 - Lirsf of all, 0 ^ gi < 2 for ofherwise we 
could increase g 2 to obtain a lexically larger tuple. Also 0 ^ ci < 2 for otherwise 
we could increase C 2 to obtain a larger tuple with a smaller count. The next step is 
to prove that 0 ^ 2 c 2 + ci <5 and 0 ^ 2g2 + gi <5. This is done by showing that, 
if 2 c 2 + ci ^ 5, then there is a larger tuple for the same amount with the same or a 
smaller count. Details are given in Exercise 7.15. The proof of the second inequality 
is the same. As a result we have 

A mod 5 = 2 c 2 + ci = 2 g 2 +gi 

and so 2 (c 2 — § 2 ) = gi — Cl < 2. Hence ci = gi and C 2 =g 2 - Next, setting 
B = (A-(2c2+ci))/5 
we have 

B — 40c8 “ 1 “ 20 c 7 -f 10 c (5 A4c 5 -t- 2 c 4 “ 1 “C 3 
= 40g8 + 20g7 + 10g6 + 4g5 -f 2g4 -f g3 
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Now 0 ^ g 3 < 2, for otherwise we could increase g 4 to obtain a larger tuple. And 
0 ^ C 3 < 2, for otherwise there would be a tuple with smaller count. Hence 

B mod 2 = C 3 = g 3 

For the next step, set C = {B — c^)/2. Then 

C = 20cg -f 10c7 -1-5 c6 “1“2c5 c 4 
C = 20g8 + 10g7 +5g6 + 2g5 +§4 

The same reasoning as in the first step shows that C 4 = g 4 and C 5 = gs- For the next 
step, set D = {C— ( 2 c 5 +C4))/5, so 

D = 4c8 + 2c7 + C6 
D = 4gs+2gT+g6 

The same argument as in the second step shows cg = ge- Setting E = {D — C(,)/2 
we have 

E = 2c8 + C 7 
E = 2g^+g-i 

and now we can repeat the argument in the first step once again to show that cj = gi 
and eg =g 8 - 

There is no shortcut to this rather lengthy reasoning about currency; each denomi¬ 
nation has to be dealt with separately. After all, the argument might break down only 
with larger denominations. Essentially the same argument works for US currency. 
It also works for denominations that are successive powers of some base or, more 
generally, when each denomination is a multiple of the next lower denomination. 
But there appears to be no simple characterisation of when it works in general. 


7.4 Decimal fractions in T^X 

Our third problem involving lists of numbers has to do with Knuth’s typesetting 
system TgX, the system used to typeset this book. The source language of TgX is 
decimal. For instance, one can use \hspace{0.2134156in} to get a space of that 
width: I |. But internally TgX uses integer arithmetic with all fractions expressed 
as an integer multiple of 1 /2^^ = 1/65536. For example, 0.2134156 is represented 
by the integer 13986, as is the shorter fraction 0.21341. There is therefore the 
problem of converting a decimal fraction to its closest internal representation and, 
conversely, converting an internal representation to its shortest decimal fraction. In 
either direction only limited-precision integer arithmetic is allowed, the arithmetic 
of Int. The first direction is easy but the other one involves a greedy algorithm. 

Let us consider the extemal-to-intemal problem first. With Digit as a synonym 
for Int restricted to digits d in the range 0 ^ d < 10, a decimal fraction representing 
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a real number r in the range 0 ^ r < 1 can be converted into a floating-point number 
(see Exercise 1.11) by 

fraction :: [Digit] —)■ Double 
fraction =foldr shiftr 0 

shiftr ‘.‘.Digit —)■ Double —)■ Double 
shiftr dr = (fromintegral d + r)/lO 

For example, 0.d\d2d^ is converted into the real number 

{d\ + {d2 -b (^3 -i-o)/io)/io)/io = + YY)^ 

The conversion function fromintegral is needed in Haskell to convert an integer 
(here a digit) into a floating-point number before it can be added to another floating¬ 
point number. Such conversion functions obscure the arithmetic and from now on 
we will silently ignore them in arithmetic reasoning. 

The function scale converts the result r into the nearest multiple of 2^^^, namely 

Hence, since 2^^ = 131072, we can define 

scale :: Double —s- Int 

scale r = [(131072 x r-|- 1)/2J 

The external-to-internal TgX problem is now specified by 

intern:: [Digit] —Int 
intern = scale -fraction 

Well and good, buf intern uses fractional arifhmelic fo compute fhe resulf and fhe 
requiremenf was fo use only limifed-precision infeger arifhmelic. So Ihere is still a 
problem fo overcome. 

The solution is fo fry fo fuse scale and fraction info one function using fhe fusion 
law offoldr. Buf Ihis lums oul nol fo be possible (see Exercise 7.19). The besl we 
can do is fo decompose scale info Iwo functions and fuse jusl one of Ihem wilh 
fraction. We have for all inlegers a and b, wilh > 0, and real x lhal 


x + a 


[xj -^-a 

_ b \ 


b 


The proof, which uses fhe rule of floors (see Exercise 4.1), is left as anolher exercise. 
In particular, faking x = 131072r, a = 1, and ft = 2, if follows lhal 

scale = halve ■ convert 
where 

halve n = [n + f) div 2 
convert r = [131072 x rj 
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It is possible to fuse convert md fraction. We have 
convert-foldr shiftr 0 =foldr shiftn 0 
provided we ean find a funetion shiftn to satisfy the fusion eondition 
convert {shiftr dr) = shiftn d {convert r) 

To diseover shiftn we reason 

convert {shiftr d r) 

= { definitions } 

[131072 X (t/ + r)/10j 

= {(7.1)} 

(131072 xt/+ [131072 X rj) div 10 
= { definition of convert } 

(131072 X d +convert r) div 10 

Henee we ean define 

shiftn dn = (131072 xd + n) div 10 
We have shown fhaf 

intern :: [Digit] Int 
intern = halve -foldr shiftn 0 

Thai solves Ihe exlernal-lo-inlernal problem. The largesl integer fhaf can arise during 
Ihis compulalion is al mosl 1310720, so Int arithmetic is sufficient. Notice that we 
have nowhere exploited any property of 131072 except that it was a positive integer. 
But for 2'^ the algorithm can be optimised: except for the first 17 digits, all the 
other digits of the fraction can be discarded because they cannot affect the answer. 
The proof is left as Exercise 7.20. 

The other direction is to find for a given n in Ihe range 0 ^ n < 2^^ some shorlesl 
decimal fraction whose internal representation is n. Again, only limited-precision 
integer arithmetic is allowed. We know by now how to set up the problem: 

extern ■.■.Int —)■ [Digit] 

extern n = minWith length {externs n) 

where n is restricted to the range 0 ^ n < 2^®. Ideally, the function externs n should 
return a list of all finite decimals whose internal value is n. The problem is that this is 
an infinite list, so any execution of extern would fail to terminate. For example, the 
17-digit fraction 0.01525115966796875 and the 5-digit fraction 0.01526 both have 
internal value 1000, and so does any fraction between these bounds. Sometimes, as 
here, the set of candidates is infinite, and selecting a best one, though expressible 
mathematically, cannot be formulated as an executable expression. 

One solution, as we have seen above, is to generate only decimals of length 
at most 17. In fact, it is sufficient to generate decimals of length at most 5 (see 
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Exercise 1.11). Indeed, in the first implementation of TgX a decimal of exactly 
length five was always chosen. But this choice proved unsatisfactory (a user who 
asked for a 0.4-point rule was told that TgX had actually typeset a 0.39999-point 
rule), so Knuth implemented a greedy algorithm. 

Instead we will look at another way to generate a finite list of possible decimals, 
one guaranteed to include all the shortest decimals. This method will lead to the 
greedy algorithm. To determine the list, observe that a decimal ds is an element of 
extems n if and only if scale (fraction ds) = n. Abbreviating 131072 to w in what 
follows, we have 

scale r = n ln — \^wr<ln+\ 

since scale r = [(w r+\)/1\. That suggests generalising extems to a function, 
decimals say, that takes an interval as argument: 

extems n = decimals (2n — l,2n-|-l) 

where, provided a <b and b>0, the value of decimals {a,b) is any list of decimals 
ds satisfying 

a X fraction ds<b 

as long as it includes all the shortest decimals satisfying the constraint. To arrive at 
a definition of decimals, observe first that 

a X fraction)] <b a^0<fi 

so we can set decimals (a,fi) = [[]] if a ^ 0. Secondly, we have 

a ^ w X fraction {d : ds) < b 
{ definition of fraction } 
a ^ w X shiftr d (fraction ds) < b 
4^ { definition of shiftr, writing r = fraction ds } 

a ^w{d + r)/10 <b 
44^ { arithmetic } 

I0a — wd^wr<f0b — wd 
44> { since 0 ^ r < 1 (and P44>PAQ if P^Q)} 

{lOa/w — I <d< lOb/w) A (10a — wd^wr< fOb — wd) 

4A { since d is an integer } 

([lOa/wJ ^ d ^ \l0b/w\) A {lOa — wd ^ wr< lOb — wd) 

4A { since d is a digit } 

{max 0 [lOa/wJ ^ d ^ min 9 [lOfi/wJ A 
{f0a — wd^wr<f0b — wd) 

Hence 

a ^ w X fraction (d : ds) < b 

44- l^ d A 10a — wd ^fraction ds < fOb — wd 
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where I = max 0 [lOa/wJ and u = min 9 \ \9b/w\. That suggests the following 
definition of decimals: 

decimals:: {Int,Int) —)■ [[D/g/t]] 
decimals {a,b) = 
if a ^ 0 then [ [ ] ] 

else [d: ds \ d [I.. u],ds decimals (10 x a — w x d,l0 xb— w x d)] 

where w = 131072 

I = 0 max ((10 X a) div w) 
u = 9 min ((10 x b) div w) 

Given this definition, we elaim that externs n returns a list of all decimals ds such 
that intern ds = n hut intern ds' < n for any proper prefix ds' of ds. For the proof, 
observe that the successive intervals generated hy decimals {a,b) have lower hounds 

a, \0a — wd\, 10(10a — wt/i) — ^^ 2 , ••• 

The ^th term of this sequence is 

10^ a — w (10^^^ d\-\ -h 10^ dk) = 10^ {a — w x fraction ds) 

Hence decimals {a,b) produces a list of the shortest ds such that a ^wr, where 
r = fraction ds. Furthermore, 2n — 1 ^ wr if and only if n ^ scale r, so externs n 
produces decimals ds that scale to n hut no prefix of ds does. 

However, the above definition of decimals contains a bug that we have encoun¬ 
tered before in the definition of binary search. The problem is that the numbers can 
get quite large and the arithmetic of Int is not up to the job. Instead we have to move 
over to Integer arithmetic and define decimals as a function with type 

decimals:: {Integer,Integer) —)■ [[Digit]] 

The reason for the bug and the necessary revisions of decimals and externs are left 
as an exercise. 

Now that we have ensured that extents returns a finite list, we can return to 
consideration of extern. As in the coin-changing problem, there may be more than 
one shortest fraction with the same internal representation. For example, both 
0.05273 and 0.05274 are shortest fractions whose internal representation is 3456. 
The above definition of extern returns the first fraction while, as we will see, the 
greedy algorithm returns the second. Once again the solution is to change the cost 
function. 

The revised definition of extern should cause no surprise: 
extern = maximum ■ externs 

As with coin-changing, we switch to selecting the lexically largest decimal fraction. 
The proof that the largest fraction returned by extents is a shortest fraction is given 
later on. 
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We will omit the ealeulation that gives the following greedy algorithm: 

extern ::Int —)■ [Digit] 

extern n = decimal (2 x n — 1,2 x n + 1) 

where 

decimal:: {Int,Int) —t [Digit] 
decimal (a, b) = if a ^ 0 then [ ] 

else d: decimal {\0 x a — w x d,\0 x b — w x d) 
where d = (10 x b) div w 
w = 131072 

Note first that Integer arithmetic has been replaced by Int arithmetic again. We 
claim that < w for all calls of decimal (a, b). With n < 2^^ we have 

2n + l ^2(2^®-l) + l = 2i’-l<2'^ =w 
so the claim holds for the initial call. Furthermore 
lOfi —w [lOfi/wJ = 10fimodw<w 

so the claim holds for recursive calls. With b<wwe have 0 ^ [lOfi/wJ < 10, so d 
is always a valid digit. 

It remains to show that the largest decimal fraction is also a shortest one. We do 
this by showing that if dsi and ds 2 are two different decimals in decimals {a,b), then 
dsi < ds 2 length ds 2 ^ length dsi. We saw above that, if decimals {a,b) produces 
a decimal ds, then it cannot also produce a proper prefix of ds. Hence dsi cannot be 
a prefix of ds 2 - Now, by definition of lexical order, we have dsi = us -H-di :vsi and 
ds 2 = us ->r\-d 2 : VS 2 , where di <d 2 - Let k be the length of us and n be the decimal 
integer formed from the digits in us. It is easy to show that both di: vxi and d 2 : VS 2 
are in 

decimals (10^ x a — 131072 x n, 10^ xb — 131072 x n) 

But di <d 2 , and that means [d 2 ] is also in this list. And since [d 2 ] and d 2 :vs 2 cannot 
both be in the list unless VS 2 is the empty list, we conclude that ds 2 = nx-H- [r/ 2 ]> 
which is no longer than dsi. 


7.5 Nondeterministic functions and refinement 

All three problems in this chapter have been successfully dealt with by changing 
the cost function into another one that guarantees a linear order, so minimum and 
maximum elements are unique. However, this device is not always possible. In 
general, in order to establish a (context-sensitive) greedy condition of the form 

gstep X (minWith cost {candidates x^)) 

= minWithcost {map {gstep x) {candidatesx)) 
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when there may be more than one eandidate with minimum eost, we have to prove 
the very strong property 

cost c ^ cost c ^ cost {gstep xc) ^ cost {gstep x c) (7.2) 

for all candidates c and c'. To see why, observe that, if c is the first candidate returned 

by candidates with minimum cost, then gstep x c has to be the first candidate with 
minimum cost in the list of extended candidates. This follows from our definition of 
minWith, which selects the first element with minimum cost in a list of candidates. 
To ensure that the extension of a candidate c' earlier in the list has a larger cost, we 
have to show that 

cost c > cost c ^ cost {gstep x c) > cost {gstep x c) (7.3) 

for all candidates c and c'. To ensure that the extension of a candidate c' later in the 
list does not have a smaller cost, we have to show that 

cost c ^ cost c' ^ cost {gstep x c) ^ cost {gstep x c') (7.4) 

for all c and c'. The conjunction of (7.3) and (7.4) is (7.2). 

The problem is that (7.2) is so strong that it rarely holds in practice. A similar 
condition is needed if, say, minWith returned the last element in a list with minimum 
cost. What we really need is a form of reasoning that allows us to establish the 
necessary fusion condition from the simple monotonicity condition (7.4) alone, 
and the plain fact of the matter is that equational reasoning with any definition of 
minWith is simply not adequate to provide it. 

It follows that we have to abandon equational reasoning, at least for a function 
like minWith. One general approach is to replace our functional framework with a 
relational one, and to reason instead about the inclusion of one relation in another. 
But for our purposes this solution is way too drastic, more akin to a heart transplant 
than a tube of solvent for occasional use. The alternative, if it can be made to 
work smoothly, is to introduce a nondeterministic variant of minWith and to reason 
about the refinement of one expression by another instead of the equality of two 
expressions. 

Suppose we introduce MinWith cost as a nondeterministic function, specified by 
the assertion that x is a possible value of MinWith cost xs precisely when xs is a 
finite nonempty list of well-defined values, x is an elemenf of xs, and cost x ^ cost y 
for all elemenfs y of xs. Note fhe inifial capifal teller: MinWith is nol pari of Haskell. 
It is not our intention to extend Haskell with nondeterministic functions. Instead, 
MinWith is simply there to extend our powers of specification and will not appear 
in any final algorithm. 

We will write x t— MinWith cost xs to mean that x is one possible element of 
xs with minimum cost. The symbol t— is read as “is a refinement of”. Think of 
MinWith cost xs as the set of elements of xs with minimum cost and interpret <— 
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as set membership. The situation is analogous to order notation, in which 0{g{n)) 
is interpreted as a set of functions and the equality sign in f{n) = 0{g{n)) as 
set membership. For example, 1 MinWith cost [1,2] is a true assertion provided 
cost 1 ^ cost!. On the other hand, ne,ii\\&v MinWith cost [] nor MinWith cost [1,_L,2] 
is well-defined. 

More generally, if Ei and E 2 are possibly nondeterministic expressions of the 
same type T, then we will write E\ ■(— E 2 to mean that 

V El ^ V E 2 

for all values v of type T. Thus the symbol in E 2 should he thought of as set 
inclusion. The situation is analogous to an assertion such as 2n^ + 0{n^) = 0{n^) 
in which the = sign really means set inclusion. 

Next, suppose E and Ei are possibly nondeterministic expressions. Then we 
interpret x ■(— £'(£'1 ) to mean that there exists a y such that y £1 and x ■(— £(y). 
Consequently, we have 

£i^£2 ^ £(£i)^£(£2) 

Thus all expressions are monotonic under refinement. 

As an example, consider the greedy condition 

gstep X {MinWith cost {candidates x^)) 

MinWith cost {map {gstep x) {candidates xs)) 

First of ail, this assertion is meaningful only if candidates xs is a finite nonempty 
list of well-defined values and gstep x refums well-defined resulfs on well-defined 
argumenfs. In such a case, fhe assertion 

c ■(— gstep X {MinWith cost {candidates x^)) 

holds if c = gstep x c' for some c' such fhaf c' ^ MinWith cost {candidates xs). The 
greedy condition asserts thaf, for some candidate c' in fhe lisf cs = candidates xs 
wifh minimum cosf, gstep x c' is a candidate wifh minimum cosf in map {gstep x) cs. 
Unlike fhe previous version of fhe greedy condition, fhis assertion does follow from 
fhe simple monofonicify condition (7.4). To spell ouf fhe defails, suppose c' is a 
candidate in cs wifh minimum cosf. We have only fo show fhaf 

cost {gstep xc') ^ cost {gstep x c") 

for all candidates c" in cs. Buf fhis follows af once from (7.4) and the assumption 
cost c' ^ cost c”. 

Next, we define fwo nondeterministic expressions of fhe same fype fo be equal if 
fhey bofh have fhe same sef of refinemenfs. Thus 

E\ = £2 El i — £2 A £2 <— El 
For example, consider fhe disfrihufive law 

MinWith cost {concat xss) = MinWith cost {map {MinWith cost) xss) 
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where xss is a finite nonempty list of finite nonempty lists. This is a law we definitely 
want to hold. The equality sign here means that there is no refinement of one side 
that is not also a refinement of the other side. We interpret the assertion 

xs ■(— map (MinWith cost) xss 

to mean that, if xss = [xsi,xs 2 ,---,xsn\ is a list of finite nonempty lists of well- 
defined values, then xs = [xi,X 2 , ...,x„], where xj ■(— MinWith cost xsj. The proof 
that the distributive law holds is left as Exercise 7.23. 

What else do we want? Well, we certainly want a refinement version of the fusion 
law for foldr, namely that 

foldr gstep cq xs ■(— MinWith cost (foldrfstep [cq] 
for all finite lists xs provided 

gstep X {MinWith cost ys) ■(— MinWith cost (/step x ys) 

for all X and all ys of the form ys = foldrfstep [cq] xs. Here is the proof of the fusion 
law. The base case is immediate and the induction step is as follows: 

foldr gstep cq (x : x^) 

= { definition of foldr } 

gstep X {foldr gstep cq xs) 

^ { induction and monotonicity of refinement } 

gstep X {MinWith cost (foldrfstep [cq] x^)) 

{ fusion condition } 

MinWith cost (fstepx (foldrfstep [cq] x^)) 

= { definition of foldr } 

MinWith (foldrfstep [cq] (x'.xs)) 

Let us see what else we might need by redoing the calculation of the greedy 
algorithm for mcc. This time we start with the specification 

mcc xs ■(— MinWith cost (candidates xs) 

For the fusion condition we reason, with cs = candidates xs, 

MinWith cost (fstep x cs) 

= { with, fstep = concatMap ■ extend } 

MinWith cost {concatMap {extend x) c^) 

= { distributive law } 

MinWith cost {map {MinWith cost ■ extend x) cs) 

—)■ { suppose gstep xxs ^ MinWith cost {extend x xs) } 

MinWith cost {map {gstep x) cs) 

—)■ { greedy condition } 

gstep X {MinWith cost {candidates xs)) 

We write Ei —)■ E 2 as an alternative to E 2 -^ E\. The second step makes use of the 
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distributive law, and the third step makes use of the monotonicity of refinement. As 
we saw above, the greedy condition follows from (7.4). 

We have introduced a single nondeterministic function MinWith cost, which 
will be sufficient for the following two chapters. In Part Four of the book, on 
thinning algorithms, we will need another nondeterministic function, ThinBy, and 
that function will be dealt with in the same way as MinWith, namely by simply 
stating the valid rules of reasoning about refinement. 


7.6 Summary 

Let us summarise the general points that emerge from this chapter: 

1. A greedy algorithm arises from the successful fusion of a function that selects 
a best candidate with a function that generates all candidates, or at least all 
candidates that may turn out to be best ones. 

2. Tbe best candidate can sometimes be defined in different ways. In hillwalking 
terms, the highest point may also be the one with the best view and one can 
choose to maximise the height or to maximise the view. In either case the result 
is the same. 

3. Sometimes the simple statement of the greedy condition is too strong because it 
does not take context into account. 

4. While it may be possible to prove that a context-sensitive fusion condition holds 
in special cases, usually by changing the cost function, in general one may have 
to replace reasoning with equality by reasoning with refinement in order to prove 
that a greedy algorithm works. 


7.7 Chapter notes 

Both Insertion sort and Selection sort are well-known sorting algorithms, though 
they are not usually described as being greedy algorithms. Knuth starts his com¬ 
prehensive text [6] with a study of inversions, and many interesting properties of 
inversions can be found there. 

The coin-changing problem bas a long bistory. Recent references include [1,5]. 
The TgX problem was first discussed in [7], under the title “A simple program 
whose proof isn’t”, and considered further in [3]. It is remarkable that both of these 
problems succumbed to exactly the same calculation. 

For further information about how to reason about nondeterminism in a functional 
setting, see [4]. There are many articles about nondeterministic functions and 
refinement, and many ways of formalising these ideas. One way is to regard a 
nondeterministic function as a relation, and refinement as the inclusion of one 
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relation in another. The relational approach to programming, in a categorical setting, 
is described in [2]; this book also contains the TgX problem as an example. Another 
approach, which we have more or less followed above with minor syntactic changes, 
is given in [8] and the earlier [9]. These two articles record the pitfalls one can 
tumble into if sufficient care is not taken. 
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Exercises 

Exercise 7.1 The Data.List library provides a function 
minimumBy:: (a —)■ a —)■ Ordering) —)• [a] —)• a 
Define minWith using minimumBy. 

Exercise 7.2 Wrife down a dehnifion of a function minsWith f fhaf refurns all 
fhe elemenfs of a hnife nonempfy lisf fhaf minimise/. In particular the statement 
V MinWithf xs can be read as x € minsWithf xs. 

Exercise 7.3 Prove that, if/ is associative, then 
foldrl f (x5-|-|-y^) =/ (foldrl f xs) (foldrl f ys) 
for all nonempty lists xs and ys. Hence show that 
foldrl f {concat xss) = foldrl f {map {foldrl f) 
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provided xss contains only nonempty lists. Finally, show that 
minWithf (concat xss) = minWithf {map {minWithf) xss) 
provided xss contains only nonempty lists. 

Exercise 7.4 Write down a divide-and-conquer definition of perms. 

Exercise 7.5 Why is the law 

minimum ■ map (x:) = (x:) • minimum 
not valid on empty lists? 

Exercise 7.6 Show that minimum ■ map f =f ■ minimum on nonempty lists if / is 
monotonic. Is the monotonicity condition necessary? 

Exercise 7.7 Given that gstep x = minimum ■ extend x, derive a recursive definition 
of gstep. 

Exercise 7.8 With gstep as defined in fhe previous quesfion, show fhaf 
minimum {map {gstep x) xss) = gstep x {minimum xss) 

provided all lisfs in xss have fhe same lengfh. Give an example fo show fhe condifion 
fails if xss can confain lisfs of differenf lengfh. 

Exercise 7.9 Suppose (xi,y5i) and {x 2 ,ys 2 ) are two picks of the same list. Show 
that 

(xi,y5i) ^ {x2,ys2) ^ xi'.sortysi ^X2:sortys2 

Exercise 7.10 Write down a linear-time algorithm for computing pick. 

Exercise 7.11 Consider evaluation of Insertion sort on the list [3,4,2,5,1]. Keep¬ 
ing in mind that Flaskell is a lazy language, continue the following sequence of 
evaluation steps until the first element of the result is obtained: 

gstep 3 {gstep 4 {gstep 2 {gstep 5 {gstep 1 [])))) 
gstep 3 {gstep 4 {gstep 2 {gstep 5 (1: [])))) 
gstep 3 {gstep 4 {gstep 2(1: gstep 5 [ ]))) 

Now answer the following questions. How long does it take to compute head ■ isort 
on a nonempty list? What is the precise sequence of comparisons made for sorting 
[3,4,2,5,1]? Does Insertion sort actually work by inserting a new element into a 
sorted list at each step? 

Exercise 7.12 Explain why mkchange [7,3,1] 54 = [6,4,0], where 
mkchange ds = minWith count ■ mktuples ds 
What change to the definition of mktuples would produce [7,1,2]? 
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Exercise 7.13 Here is the weight-based version of eoin-ehanging: 

type Weights = [Int] 

weight :: Weights —)■ Tuple —)■ Int 
weight ws cs = sum {zipWith (x) c^) 

mkchangew:: Weights —)■ [Denom] —)■ Nat —)■ Tuple 
mkchangew ws ds = minWith {weight wi) • mktuples ds 

In the UK it is the case that minimising count also minimises weight. We could 
prove this by simply carrying out an exhaustive test: 

ukws = [ 1200,950,800,500,650,325,712,356] 

test = [n I n t— [ 1.. 200], mkchange ukds n ^ mkchangew ukws ukds n] 

We only need to check amounts up to £2. But test returns a nonempty list beginning 
with 2 because 

mkchange ukds 2 = [0,0,0,0,0,0,1,0] 

mkchangew ukws ukds 2 = [0,0,0,0,0,0,0,2] 

One 2p coin weighs the same as two Ip coins. What has gone wrong, and how can 
the test be corrected? 

Exercise 7.14 Express the function mktuples as an instance offoldr. (Hint: maintain 
a list of pairs, where a pair consists of a tuple and a residual amount, and then at 
the end select the first component of a pair with a zero residue.) Write down the 
associated greedy algorithm. We will use such a definition when we come to discuss 
a thinning algorithm for the same problem. 

Exercise 7.15 Consider the denominations [5,2, 1] and a largest tuple [c 3 ,C 2 ,ci] 
with minimum count. Show that, if 2 C 2 -|- cj ^ 5, then there is a larger tuple for the 
same amount but with a smaller count. If the denominations were [4,3,1 ], do we 
necessarily have 3 C 2 -|- ci < 4? 

Exercise 7.16 Consider the UR (United Region) currency whose denominations 
are [100,50,20,15,5,2,1]. Explain carefully where the argument as to why the 
greedy algorithm works for UK currency breaks down with UR currency. Does the 
greedy algorithm work for UR currency? 

Exercise 7.17 Prove that, if each denomination is a multiple of the next lower 
denomination, then the greedy algorithm works. 

Exercise 7.18 Prove (7.1) using the rule of floors. 

Exercise 7.19 We calculated that 

intern = halve -foldr shiftn 0 
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Suppose halve -foldr shiftn 0 =foldr op 0 for some function op. The associated 
fusion condition is 

halve {shiftn dn) = op d {halve n) 

for all n of the form n = foldr shiftn 0 ds. Using the rule of floors, we have 

halve {shiftn dn) = (2^^ xd + n + 10) div 20 

Now, since halve {2xn) = halve (2 x n — 1), the fusion condition requires that 

(2^’ xr/ + 2xn + 10) div 20 = (2^^ x r/ + 2 x n + 9) div 20 

Your task is to find a two-digit decimal ds with n = foldr shiftn 0 ds such that the 
above statement is false for d = 0. 

Exercise 7.20 Why can all hut the first 17 digits of the input he ignored? 

Exercise 7.21 Why does the first definition of decimals contain a hug? Hint: as we 
said in the very first chapter, Haskell does not guarantee that the type Int covers 
a greater range than [—2^^,2^^). Would the hug still occur if a Haskell compiler 
allowed the range [—2^^, 2^^) for Inti Give the necessary revisions of decimals and 
extems that solve the problem. 

Exercise 7.22 For n < 2^^ the integer D, where 
n 1 


D = 




satisfies D < 10^ and so has at most 5 digits. Using this fact, show that extern n has 
at most 5 digits. 

Exercise 7.23 To verify the distributive law for MinWith cost, we have to show that, 
ifx<— MinWith {concat xss), then there exists a list xs such that 

xs t— map {MinWith cost) xss A x t— MinWith cost xs 

Conversely, we also have to show 

xs t— map {MinWith cost) xss A x t— MinWith cost xs 
=> X t— MinWith cost {concat xss) 

Prove these two claims. 

Exercise 7.24 Suppose MCC xs = MinWith cost {candidates x^). Show that 
foldr gstep exs ^ MCC xs 
provided e t— MCC [ ] and 

c t— MCC xs ^ gstep xc <r- MCC (x: xs) 
for all candidates c, components x, and lists of components xs. 
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Exercise 7.25 Define Flip "Bool —)■ Bool by 
Flip X = MinWith {const 0) [x, not x] 

Which of the following assertions are true? 

id ^ Flip not ^ Flip not ■ not = not Flip ■ Flip = Flip 


Answers 

Answer 7,1 One simple definition: 
minWithf = minimuniBy cmp 

where cmp xy = compare (f x) (f y) 

A more efficient definition: 

minWithf = snd ■ minimumBy cmp ■ map tuple 
where tuple x= (f x,x) 

cmp (x,_) (y,_) = compare xy 

Answer 7.2 One simple definition: 

minsWithf xs = \x\x ^ xs, and [f x y \ y xs]] 
A more efficient definition: 

minsWithf = map snd -foldr step [ ] • map tuple 
where tuple x= (f x,x) 

stepx [] =[x] 

step X {y:xs) \ a<b = [x] 

I a == b = x'.y.xs 
\a>b =y:xs 
where a =fst x;b =fsty 

Answer 7.3 We can prove 

foldr 1 f (x5+|-y5) =/ (foldr 1 f xs) (foldr 1 f ys) 
by induction on xs. The induction step is 

foldr 1 f (x'.xs-W ys) 

= { definition of foldr 1 } 

f X (foldr 1 f (x5+|-y5)) 

= { induction } 

f X (f (foldr 1 f xs) (foldr 1 f ys)) 

= { associativity of / } 

f (f X (foldr 1 f xs)) (foldr 1 f ys) 

= { definition of foldr 1 } 

/ (foldr 1 f (x : xs)) (foldr 1 ys) 
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The proof of 

foldrlf {concat xss) =foldrlf {map (foldrlf) xss) 
is also by induction. The induction step is 

foldrlf {concat 
= { definition of concat } 

foldrl f {xs Tf concat xss) 

= { above } 

/ {foldrlf xs) (foldrlf {concatxss)) 

= { induction } 

/ {foldrl f xs) {foldrl f {map {foldrlf) xss)) 

= { definition of foldrl } 

foldrl f {foldrl f xs : map {foldrl f) xss) 

= { definition of map } 

foldrl {map {foldrlf) {xs:xss)) 

The final claim holds because smaller f is associative. 

Answer 7.4 One definition is 

perms:: [a] —)• [[a]] 
perms [ ] = [ [ ] ] 
perms [x\ = [[x]] 

perms xs = concatMap interleave {cp yss zss) 
where = perms ys 
zss = perms zs 

{ys,zs) = splitAt {length xs div 2) xs 

cp:: [a] [b] [{a,b)] 

cp xsys = [{x,y) \ x ■(— xs,y ■(— y^'] 

interleave:: ([a], [a]) —)■ [[a]] 
interleave {xs, []) = [•^‘^] 

interleave {[\,ys) = 

interleave {x:xs,y:ys) = map {x:) {interleave {xs,y:ys)) 4f 

map {y:) {interleave (x:x5,y5)) 

Answer 7.5 We have 

minimum {map {x:) []) = _L 
X: minimum [ ] = x: _L 

This is a consequence of fhe facf fhaf fhe (:) operation is nof sfricf in Haskell. 

Answer 7.6 The resulf clearly holds for a singleton lisf. For fhe induction sfep we 
argue as follows: 
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minimum {mapf {x : xs)) 

= { definition of map } 

minimum (f x: mapf xs) 

= { definition of minimum } 

min if x) (minimum (mapf xs)) 

= { induetion } 

min (f x) (f (minimumxs)) 

= { elaim: min (f x) (f y) =f (minxy) } 

f (minx (minimumxs)) 

= { definition of minimum } 

f (minimum (x'.xs)) 

The elaim is equivalent to the eondition that/ is monotonie. 

For the second question, suppose that a<b<c,f a <min (f b) (f c), and/ c <f b. 
Then/ is not monotonie but, nevertheless, 

minimum [f a,f b,f c] =f (minimum [a,b,c]) 

Answer 7.7 It is easy to show gstep -v [] = [x]. For the induction step we argue 

gstep X (y: xs) 

= { definition of gstep } 

minimum (extend x (y: x^)) 

= { definition of extend } 

minimum ((x:y:xs): map (y:) (extendx xs)) 

= { definition of minimum } 

min (x:y:xs) (minimum (map (y:) (extendxxs))) 

= { since minimum ■ map (y:) = (y:) • minimum on nonempty lists } 

min (x:y:xs) (y: minimum (extendxxs)) 

= { definition of gstep } 

min (x:y:xs) (y: gstep xxs) 

Hence we have the definition 

gstep X [ ] = [x] 

gstep X (y :x^) = min (x:y:xs) (y: gstep xxs) 

Answer 7.8 We show that gstep x is monotonie, that is, 

as ^ bs gstep xas ^ gstep x bs (7.5) 

whenever as and bs have the same length. The proof is by induction. The claim is 
immediate if both as and bs are the empty list. For the induction step, suppose a : as 
and b : bs are two lists of the same length with a:as ^b:bs, so either a<b,ov a = b 
and as ^ bs. \fa<b, then 
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x'.a'.as <x\b'.bs A a: gstep xas <b: gstep x bs 
so gstep X {a: as) < gstep x {b: bs). In the case a = b and as ^ bs we have 
X'.a'.as ^ x'.a'.bs A a: gstep xas ^ a: gstep x bs 

because, by induction, gstep xas ^ gstep x bs. 

For the second question, we have [ ] < [ 1 ], but 

gstepl [] = [2] > [1,2] = gstep! [1] 

Answer 7.9 Either x\ < X 2 , in which case the implication is immediate, or xj = X 2 , 
in which case and ys 2 contain exactly the same elements and sorting these lists 
produces the same result. 

Answer 7.10 The definition is 
pick[x] =(x,[]) 

pick {x'.xs) = ifx ^ y then {x,xs) else {y,x:ys) where {y,ys) = pickxs 

Answer 7.11 The evaluation sequence is as follows: 

gstep 3 {gstep 4 {gstep 2 {gstep 5 {gstep 1 [])))) 
gstep 3 {gstep 4 {gstep 2 {gstep 5 (1: [])))) 
gstep 3 {gstep 4 {gstep 2(1: gstep 5 [ ]))) 
gstep 3 {gstep 4(1: gstep 2 {gstep 5 [ ]))) 
gstep 3 (1 :gstep 4 {gstep 2 {gstep 5 []))) 

1: gstep 3 {gstep 4 {gstep 2 {gstep 5 []))) 

In answer to the first question, it takes &{n) steps to compute the head of Insertion 
sort on a list of length n. In answer to the second question, the precise sequence of 
comparisons is 

(5,1) (2,1) (4,1) (3,1) (2,5) (4,2) (3,2) (4,5) (3,4) (4,5) 

In answer to the third question. Insertion sort is not really sorting by insertion, at 
least when evaluated lazily. It is more akin to a sorting algorithm known as Bubble 
sort, though not exactly the same sequence of comparisons is performed. The lesson 
here is that under lazy evaluation you don’t always get what you think you are 
getting. 

Answer 7.12 Because mktuples produces tuples in increasing lexical order. If we 
change the definition to read 

mktuples:: [Denom] —)■ Nat —)■ [Tuple] 
mktuples[\]n = [[?^]] 

mktuples {d: ds) n = [c: C5 | c ■(— [m,m — 1.. 0],C5 ■(— mktuples ds {n — c x d)] 
where m = n div d 

then mktuples would produce tuples in decreasing lexical order and we would have 
mkchange [7,3,1] 54 = [7,1,2]. 
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Answer 7,13 The culprits, once again, are the definitions of minWith cost and 
mktuples. Since one 2p coin weighs exactly the same as two Ip coins, there is no 
unique minimum-weight tuple. The test can he corrected by redefining mktuples 
as in the previous exercise to generate tuples in decreasing lexical order. Then test 
does return the empty list. 

Answer 7,14 Expressing mktuples in terms of foldr means processing the list of 
denominations from right to left. It follows that, in order to process denominations 
in decreasing order of value, we have to reverse given lists of currencies like ukds. 
Thus we define 

mktuples ds n = finish (foldr (concatMap -extend) [([],«)] (reverse ds)) 
where finish = mapfst -filter (X (cs, r). r == 0) 

extend d {cs,r) = [(ci'-H- [c],r — c x <i) | c t— [0. .r div d]] 

That leads to the greedy algorithm 

mkchange ds n =fst (foldr gstep ([],«) (reverse ds)) 

where gstep d (cs,r) = (cs[c],r — c x d) where c = r div d 

The greedy algorithm can be calculated by fusing maximum with mktuples. 

Answer 7,15 We have 2 C 2 + ci ^ 5 if C 2 ^ 3 or if (c 2 ,ci) = (2,1). In the first case 
we can increase C 3 by one and replace (c 2 ,ci) either by (c 2 — 3,1), if c\ = 0, or by 
(c 2 — 2,0), if Cl = 1. In the second case we can increase C 3 by one and set both ci 
and C 2 to zero. In each case this gives a larger tuple with a smaller count. 

The answer to the second question is no, because the tuple [c 3 , 2 , 0 ] has a smaller 
count than [c 3 + 1 , 0 , 2 ]. 

Answer 7,16 Let [c 7 ,C 6 , ...,ci ] be the optimal solution and let [g 7 ,g 6 ) ] be the 

greedy one, so 

A = IOOC 7 -f 50 c 6 -f 20 C 5 T 15c 4 -t-5c3 T2 c 2 -fci 
A = 100g7 + 50g6 + 20g5 + 15g4 + 5g3+2g2 + gi 
The same argument as in the text shows that ci = gi and C 2 = g 2 - Next, with 
B = (A — (2 C 2 + Cl))/5 we have 

B = 20 C 7 A 10c6 A4 c5 A3c4 AC 3 

B = 20g7 + lOge + 4g5 + 3 g4 + g3 

But now the argument breaks down since we cannot show that C 3 = § 3 . In UR 
currency we have 1 x 20 + 2 x 5 as the greedy choice for 30 units, while 2 x 15 is 
the same amount with one coin fewer. 

Answer 7,17 Suppose, as in the previous solution, we have 

A = dkCk -\- - - - +(^ 2^2 + Cl 
A = dkgk^ - Vd2g2+gi 
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Since d 2 divides into each other denomination, we have A mod d 2 = ci = gi - Next, 
with B = {A —c\)/d 2 we have B mod d^ = C 2 = g 2 - And so on. 

Answer 7.18 We can reason as follows: 

k ^ l{x + a)/b\ 

4A { rule of floors } 
kb ^x + a 

4A { rule of floors } 
kb — a ^ [x\ 

4A { rule of floors } 
k ^ [([vj +a)/b\ 

Hence [(x + a)/^J = [([xj +a)/b\. 

Answer 7,19 Take ds = [0,7]. We have n =foldr shiftn 0 [0,7] = 9175 and 

(2^’ X 0 + 2 X 9175 +10) div 20 = 918 
(2^’X 0 + 2x9175+ 9) div 20 =917 

Answer 7.20 Let r = fraction ds and / = fraction (take 17 ds). Then [10^’rJ = 
[10^^ /J. Furthermore, 


2i’r+l 


lO^V + 51 ^ 

2 


2x51^ 


so scale r = scale r' by (7.1). So, only 17 digits matter. The smallest internal 
value is 0, which occurs only if the input decimal is strictly less than the decimal 
0.00000762939453125, the value of 2^^^. The largest internal value is 2^^ = 65536, 
which occurs only if the input is greater than the decimal representation of 1 — 2^^, 
namely 0.99999237060546875. Hence 17 digits are sometimes necessary. 

Answer 7.21 The reason for the bug is that, since lOb — w [lOa/wJ ^ \0{b — a), 
the size of the interval argument to decimals can grow from 2 to 2 x 10^^. Since 
2 X 10^^229 , the upper bound of an interval can exceed the range of Int. However, 
2 X lO'^ < 2^^, so the problem does not arise with 64-bit computers. The revised 
definition of externs is 

extems:: Int ^ [ [Digit] ] 
extems n = decimals {2xn' — 2 xn' + \) 

where n' = fromintegral n 

The definition of decimals v. {Integer, Integer) —)■ [[D/g/t]] is as before except that 
the term d: ds has to be replaced with frominteger d: ds because digits are elements 
of Int, not Integer. 
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Answer 7.22 The fraction D/10^ produces the internal number n', where 
D 


^2 


n 

K)5 “ ^ 

We also have 

by definition of D. Now 
,16 


-17 


1 

^ 2 


n-n'\ ^ n-2^^D/lO^ + n'-2^^D/lO^ ^ 2 ^ 710 ^ + 1/2< 1 


son = n . 


Answer 7.23 Suppose xss = ,..., xsn ] is a finite nonempty list of finite nonempty 

lists. If .r ■(— MinWith cost {concat then x is an element of some list xsi 
with a cost that is no greater than any other element of concat xss. Suppose 
Xj ^ MinWith cost xsj for each j / i. Then the list xs = [xi,.. ,x,_i,x,x,+i,. .x„] 
is such that 

xs ■(— map {MinWith cost) xss A x ■(— MinWith cost xs 
Conversely, suppose xs = [xi, ...,x„] satisfies 
xs ■(— map {MinWith cost) xss 

so Xj ^ MinWith cost xsj for each 1 ^j^n. Now fake x = x, for some i such fhaf 
Xi ^ MinWith cost xs. Then x ■(— MinWith cost {concat x^i). The proof really only 
relies on fhe facl fhaf, if cost x ^ cost y and cost y ^ cost z, then cost x ^ cost z. 

Answer 7.24 The proof is by induction on xs. The base step is immediate, and for 
the induction step we can argue 

foldr gstep e (x: xs) 

= { definition of foldr } 

gstep X {foldr gstep e xs) 

^ { induction } 

gstep X {MCC xs) 

^ { greedy condition } 

MCC (x: xs) 

This reasoning is valid for any definition of candidates. However, unlike the greedy 
algorithm based on fusion, it gives no hint about how gstep may be defined. 

Answer 7.25 The assertion not ■ not = not is, of course, false. The ofhers are frue, 
including fhe lasf one because fhere is no refinemenl of Flip ■ Flip fhaf is nof also a 
refinemenf of Flip and vice versa. 
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The next two problems are about trees, so the greedy algorithms take place in a 
wood rather than on a hillside. The problems concern the task of building a tree 
with minimum cost, for two different definitions of cost. The first problem is closely 
related to the tree-building algorithms we have seen before in binary search and 
sorting. The second problem, Huffman coding trees, is of practical importance in 
compressing data effectively. Unlike the problems in the previous chapter, the two 
greedy tree-building algorithms require us to reason about the nondeterministic 
function MinWith in order to prove that they work. 


8.1 Minimum-height trees 

Throughout the chapter we fix attention on one type of tree, called a leaf-labelled 
tree: 

data Tree a = Leaf a \ Node {Tree a) {Tree a) 

A leaf-labelled tree is therefore a binary tree with information stored only at the 
leaves. Essentially this species of tree, though with an additional constructor Null, 
was described in Section 5.2 on Mergesort. 

The size of a leaf-labelled tree is the number of its leaves: 

size :: Tree a —)■ Nat 
size {Leaf x) =1 
size {Node uv) = size u -\- size v 

The height of a tree is defined by 

height {Leaf x) =0 

height {Node uv) = \ -{-height u max height v 

Wifh a leaf-labelled free of size n and heighf h we have the relationship h<n ^2^, 
so h^ [log n]. 
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The fringe of a tree is the list of leaf labels in left-to-right order: 

fringe:: Tree a —t [a] 

fringe {Leaf x) = [x] 

fringe {Node u v) = fringe u -Vrfringe v 

Thus fringe is essentially the same function that we have previously culled flatten. 
Note that the fringe of a tree is always a nonempty list. 

Consider the problem of building a tree of minimum height with a given list as 
fringe. We have already encountered two ways of solving this problem, both of 
which can be implemented to take linear time. The first solution is the divide-and- 
conquer, or top-down, method of Section 5.2: 

mktree:: [a] ^ Tree a 
mktree [x] = Leaf x 

mktree xs = Node {mktree ys) {mktree zs) 

where {ys,zs) = splitAt {length xs div 2) xs 

This definition does not take linear time, but it is easy to convert it into one that 
does. The trick, as we have seen in the treatment of Mergesort in Section 5.2, is to 
avoid repeated halving by tupling. Second, we have the bottom-up method, also 
described in Section 5.2: 

mktree = unwrap ■ until single {pairWith Node) ■ map Leaf 

These two ways of building a tree lead to different trees but both have minimum 
height. To show that this property holds for the first definition of mktree, let L[{n) 
denote the height of mktree for an input of length n. Then H satisfies the recurrence 
//(I) = 0 &udH{n) = 1 +H{\n/1\ ) with solution L[{n) = [log n~\ (see Exercise 8.1), 
the minimum height possible. The reason why the bottom-up method also produces 
a minimum-height tree is left as another exercise. 

Let us now change the problem slightly: given a nonempty list of natural numbers, 
can we find a linear-time algorithm for building a tree with minimum cost and the 
given list as fringe, where 

cost:: Tree Nat —)■ Nat 
cost {Leaf x) = x 

cost {Node uv) = 1+ cost u max cost v 

The function cost has the same definition as height except that the ‘height’ of a leaf 
is the label value rather than 0. In fact, if each leaf is replaced by a tree whose height 
is given by the label value, the problem is really of the following form: given a list 
of trees together with their heights, can we find a linear-time algorithm to combine 
them into a single tree of minimum height without changing the shape or order of 
the component trees? To appreciate the problem consider the two trees with the 
same fringe 
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in which each node is lahelled with its cost. The tree on the left has cost 6, hut the 
tree on the right has minimum cost 5. It is not obvious how to construct a tree with 
minimum cost, at least not efficiently, and that is where a greedy algorithm enters 
the stage. We start off with a specification and then calculate the algorithm. 

The specification is phrased as one of refinement: 

mctv. [Nat] —)■ Tree Nat 

met xs ■(— MinWith cost (mktrees xs) 

for finite nonempty lists xs, where mktrees xs is a list of all possible trees with fringe 
xs. In words, met xs is some element of mktrees xs with minimum cost. 

The function mktrees can be defined in a number of ways. We are going fo give 
fwo induefive definifions; ofher possibilifies are discussed in fhe exercises. The firsf 
mefhod is fo define 

mktrees\\ [a] —?• [Tree a] 
mktrees [x] = [Leaf x] 

mktrees [x'.xs) = concatMap {extend x) {mktrees xs) 

The function extend returns a list of all the ways in which a new element can be 
added as a leftmost leaf in a tree: 

extend :: a —)■ Tree a —)■ [Tree a] 
extend X {Leaf y) = [Node {Leaf x) {Leafy)] 
extend x {Node uv) = [Node {Leaf x) {Node « v)] -H- 
[Node u' v]u' ^ extendx u] 

For example, applying extend x to the tree 



produces the three trees 
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We might have taken mktrees [ ] = [ ] and so defined mktrees as an instance of foldr. 
But MinWith is not defined on an empty list and we have to restrict the input to 
nonempty lists. The Haskell standard library does not provide a sufficiently general 
fold function for nonempty lists (the function/oZdri is not quite general enough), 
but if we define/oWrn by 

foldm:: (a —)■ Zr —)■ Zr) —)■ (a —)■ Zr) —)■ [a] —)■ 
foldrnf g[x] =gx 
foldrnf g {x: xs) =f x (foldm f g xs) 
then the definition of mktrees above can be recast in the form 
mktrees = foldm (concatMap ■ extend) (wrap ■ Leaf) 

where wrap converts a value into a singleton list. 

The second inductive way of building a tree is to first build a forest, a list of trees: 

type Forest a = [Tree a] 

A forest can be ‘rolled up’ into a tree using 

rollup w [Tree a] Tree a 
rollup =foldll Node 

The function/o/rZZi is the Haskell prelude function for folding a nonempty list from 
left to right. For example, 

rollup [ti,t 2 ,t 3 ,t 4 ] =Node (Node (Node ti 12 ) h) t 4 
The converse to rollup is the function spine, defined by 

spine :: Tree a —)■ [ Tree a ] 
spine (Leaf x) =[Leafx] 

spine (Node uv) = spine u Tf [v] 

This function returns the leftmost leaf of a tree, followed by a list of the right 
subtrees along the path from the leftmost leaf of the tree to the root. Provided the 
first tree in a forest ts is a leaf, we have 

spine (rollup ts) = ts 
We can now define 

mktrees:: [a] —)■ [Tree a\ 
mktrees = map rollup ■ mkforests 
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where mkforests builds the forests: 
mkforests:: [a] [Forest a\ 

mkforests =foldrn {concatMap ■ extend) {wrap ■ wrap ■ Leaf) 
extend:: a —t Forest a —t [Forest a] 

extendxts = [Leaf x: rollup {take k ts):drop k ts \ k [I.. length ] 

The new version of extend is arguably simpler than the previous one. It works by 
rolling up some initial segment of the forest into a tree and adding a new leaf as the 
first tree in the new forest. For example, 

extend x[t\f 2 -,h] = [ [Leaf xfif 2 -,h]-, 

[Leaf x,Node ti f2 Tl ] > 

[Leaf X,Node {Node ti t 2 ) fa]] 

The two versions of mktrees are not the same function simply because they produce 
the trees in a different order. We will come back to spine and rollup later on. 

Let us now return to the first definition of mktrees, the one expressed directly as 
an instance offoldrn. To fuse the two component functions in the definition of met 
we can appeal to the fusion law offoldrn. The context-sensitive version of this law 
states that 

foldrnf 2 g 2 XS ^ M (foldrnfi gi xs) 
for all finite, nonempty lists xs, provided g 2 x t— M {gi x) and 
/2 X (M (foldrnfi gi xs)) ^ M (fi x (foldmfi gi xs)) 

For our problem, M = MinWith cost,f\ = concatMap ■ extend, and gi = wrap ■ leaf. 
Since Leaf x = MinWith cost [Leaf x], we can take g 2 = Leaf. For the second fusion 
condition we have to find a function, gstep say, so fhaf 

gstep X {MinWith cost {mktrees xs)) 

^ MinWith cost {concatMap {extend x) {mktrees xs)) 

As we saw at the end of the previous chapter, this condition is satisfied if the 
monotonicity condition 

cost t ^ cost f ^ cost {gstep xt) ^ cost {gstep xt') 

holds for all trees t and t' in mktrees xs. However, no such function gstep exists to 
satisfy the monotonicity condition. Consider the two trees ti and t 2 : 
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are the five trees that can he huilt with fringe [5,6,7,9]. The subtrees of each tree 
have been labelled with their costs, so both t\ and t 2 have the minimum possible 
cost 10. However, the monotonicity condition 

cost t\ ^ cost t 2 ^ cost {gstep xti)^ cost {gstep x t 2 ) 
fails for any definition of gstep. Take, for example, x = 8. Adding 8 to ti in the best 
possible way gives a tree with minimum cost 11, while adding 8 to t 2 in the best 
possible way gives a tree with cost 10. So there is no way we can define a function 
gstep for which the fusion condition holds. Once again we appear to be stuck, even 
with a refinement version of fusion. 

The only way out of the wood is to change the cost function, and once again 
lexical ordering comes to the rescue. Notice that the list of costs [10,8,7,5] reading 
downwards along the left spine of t 2 is lexically less than the costs [10,9,5] along 
the left spine of t\. The lexical cost, Icost say, is defined by 

Icost :: Tree Nat —)■ [Nat] 

Icost = reverse ■ scanll op ■ map cost ■ spine 
where opxy = 1 + (x max y) 

The costs of the trees along the left spine are accumulated from left to right by 
scanll op and then reversed. For example, spine t 2 has tree costs [5,6,7,9] and 
accumulation gives the list [5,7,8,10], which, when reversed, gives the lexical cost 
of t 2 . Minimising Icost also minimises cost (why?), so we can revise the second 
fusion condition to read 

gstep X {MinWith Icost {mktrees xx)) 

t— MinWith Icost {concatMap {extend x) {mktrees xx)) 

This time we can show 

Icost t\ ^ Icost t 2 ^ Icost {gstep xti)^ Icost {gstep x t 2 ) 
where gstep is specified by 

gstep xts MinWith Icost {extend x ts) 

To give a constructive definition of gstep and to prove that monotonicity holds, 
consider the two trees of Figure 8.1 in which ty is a leaf. The tree on the left is the 
result of rolling up the forest [ti,t 2 , ...,?„] into a single tree. The tree on the right is 
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Figure 8.1 Inserting x into a tree 


obtained by adding x as a new leaf after rolling up tbe first j elements of the forest. 
The trees are labelled with eost information, so 

C\ = cost t\ 

Ck = l + {ck-i max cost tk) 

forl^k^n. In particular, [ci,C 2 , ■■■,Cn\ is strictly increasing. A similar definition 
holds for the costs on the right: 

c'j = l + {x max Cj) 

c'k = l + {c'k-\ max cost tk) 

iorj -h 1 ^ ^ ^ n. In particular, since adding a new leaf cannot reduce costs, we have 
Ck < c'k fory ^k^n. 

The aim is to define gstep by choosing j fo minimise ...,cy,x]. For 

example, consider fhe five frees [ti,t 2 , ■■■its] wifh cosfs [5,2,4,9,6]. Then 

[ci,C2,...,C5] = [5,6,7,10,11] 

Take x = 8. There are five possible ways of adding x fo fhe foresl, namely by rolling 
up j frees for 1 ^ 5. Here fhey are, wifh cosfs on fhe leff and accumulafed cosfs 

on fhe righf: 

[8,5,2,4,9,61 ^ [8,9,10,11,12,13] 

[8.6.4.9.6] ^ [8,9,10,11,12] 

[8,7,9,61 ^ [8,9,10,11] 

[ 8 . 10 . 6 ] ^ [ 8 , 11 , 12 ] 

[ 8 , 11 ] ^ [ 8 , 12 ] 

The foresl which minimises Icost is fhe Ihird one, whose lexical cosl is fhe reverse 
of [8,9,10,11]. 

We claim fhaf fhe besf choice oij is fhe smallesf value in fhe range 1 <n, if if 

exisfs, such fhaf 

1 -h (x max Cj) < cj+i 


( 8 . 1 ) 
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If no suchj exists, then ehoosey = n. For example, with 
[ei, C2 5 C3, C4 5 C5 ] [5,6,7,10,11] 

and X = 8, the smallest j satisfying (8.1) is j = 3, with the result 
[x, 1 + (x max C3),C4,C5] = [8,9,10,11] 

On the other hand, with x = 9 we have 7 = 5, with the result 
[x, 1 + (x max C5)] = [9,12] 

To prove (8.1), suppose the claim holds for bothj and k, where 1 ^7 <k<n. Then, 
setting c'j = \ + (x max Cj) and Cjt = 1 + (x max Ck), the two sequences 

as = [x, C7, C7-1-I, ..., C^;— 1 , Ck, , ..., Cn] 

bs= [x, c'k,Ck+l,...,Cn\ 

are such that reverse as < reverse bs because < c'k- Hence, the smaller the value 
of 7 , the lower is the cost. 

To show that gstep x is monotonic with respect to least, suppose 

least t\ — [c/i,Cfi—i, 
least t2 — [r/m, dm— 1; • • • ? ] 

where least t\ ^ least t2- If these costs are equal, then so are the costs of adding a 
new leaf to either tree. Otherwise, if least ti < least t2 and we remove the common 
prefix, say one of length k, then we are left with two trees t'l and t2 with 

least t'l = \cp, ...,c \] 
least t'2 = [dij, ...,d\\ 

where p = n — k,q = m — k and Cp<dq. It is sufficient to show that 
least {gstep xt'i) ^ least {gstep x t'2) 

Firstly, suppose (8.1) holds for t'l mdj <p. Then 

least {gstepX t'l) = [cp, ...,C7+i, 1 + (x max C7),x] 

But Cp < dq, and since gstep x t'2 can only increase the cost of t'2, we have in this 
case that 

least {gstep xt'i)< least t'2 ^ least {gstep x t'2) 

In the second case, suppose (8.1) does not hold for t'y. In this case 
least {gstep X t)) = [1 + (x max Cp),x] 

Now, either 1 + (x max Cp) < dq, in which case 

least {gstep xt'i)< least t'2 ^ least {gstep x t'2) 

or 1 + (x max Cp) ^ dq, in which case x ^ — 1 and 1 + (x max dq-i) ^ dq. That 

means that (8.1) does not hold for t'2 either, and so we have 
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Icost {gstep xt\) = [ 1 + (x max Cp),x\ 

^ [ 1 + (x max dq),x\ = Icost {gstepx t' 2 ) 

That completes the proof of monotonicity. 

The next task is to implement gtep. We can rewrite (8.1) by arguing 

1 -h (x max Cj) < Cj+\ 

^ l + {x max Cj) < 1 -h {cj max cost tj^i ) 

(x max Cj) < cost tj+\ 

Hence met =foldm gstep Leaf, where 

gstep :: Nat —)■ Tree Nat —t Tree Nat 
gstep X = rollup ■ add x ■ spine 

where add is defined by 

add xts = Leaf x : join x ts 

joinx[u\ = [w] 

join X (m : V: ti) = if X max cost u < cost v 

then u'.v.ts else join x {Node uv.ts) 

However, instead of computing spines at each step and then rolling up the spine 
again, we can roll up the forest at the end of the computation. What is wanted for 
this step are functions hstep and g for which 

foldrn gstep Leaf = rollup -foldm hstep g 
We can discover hstep and g by appealing to the fusion law for foldrn. Notice that 
here we are applying the fusion law for foldm in the anti-fusion, or fission direction, 
splitting a fold into two parts. 

Firstly, we require rollup-g = Leaf. Since rollup [Leaf x] = Leaf x, we can define 
g by g = wrap ■ Leaf. Secondly, we want 

rollup {hstep x ts) = gstep x {rollup ts) 
for all X and all ts of the form ts = foldm hstep g xs. Now, 

gstep X {rollup ts) 

= { definition of gstep } 

rollup {add x {spine {rollup ts))) 

= { provided the first element of ts is a leaf } 

rollup {add x ts) 

Hence we can take hstep = add, provided the first element of ts is a leaf. But 
ts = foldm add {wrap ■ Leaf) xs for some xs and it is immediate from the definition 
of add that the first element of ts is indeed a leaf. 

We now have met = rollup -foldrn add {wrap - Leaf). As a final step, repeated 
evaluations of cost can be eliminated by pairing each tree in the forest with its cost. 
That leads to the final algorithm 



186 


Greedy algorithms on trees 


type Pair = {Tree Nat,Nat) 
met'.'. [Nat] —7- Tree Nat 

met = rollup ■ mapfst-foldm hstep {wrap ■ leaf) 

hstep'.'.Nat ^ [Pair] —)■ [Pair] 
hstep xts = leaf x: join x ts 

join ■.■.Nat —t [Pair] —)■ [Pair] 

joinx[u] = [m] 

join X {u:v:ts) = if x max snd u < snd v 

then u:v:ts else join x {node uv.ts) 

The functions leaf and node are the smart constructors 

leaf ■.■.Nat —t Pair 
leaf X = {Leaf x,x) 

node :: Pair —)■ Pair —)■ Pair 

node {u,c) {v,d) = {Node uv,l+c max d) 

For example, the greedy algorithm applied to the list [5,3,1,4,2] produces the 
forests 

[Leafl] 

[Leaf A,Leaf 2] 

[Leaf \,Node {Leaf 4) {Leaf 2)] 

[Leaf 2,Leaf I,Node {Leaf 4) {Leaf 2)] 

[Leaf 5,Node {Node {Leaf 3) {Leaf 1)) {Node {Leaf 4) {Leaf 2))] 

The final forest is then rolled up into the final tree 



with cost 7. 

It remains to estimate the running time of met. The critical measure is the number 
of calls to join. We can prove by induction that any sequence of hstep operations 
applied to a list of length n and returning a forest of length m involves at most 
2n — m calls to join. The base case, n = 1 and m = 1, is obvious. For the induction 
step, note that join applied to a list of length m' and returning a list of length m is 
called m' — m times. Thus, using the induction step that hstep applied to a list of 
length n — l and returning a forest of length m' involves at most 2{n — \) — m' calls 
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of join, we have hstep applied to a list of length n, and returning a forest of length 
m involves at most 

(2 (n — 1) — m') + 1 + {m' —m) ^2n — m 

ealls of join, establishing the induetion. Henee the algorithm takes linear time. 

Before leaving the problem of building a minimum-eost tree, we make one final 
remark. Observe that, when the input is a list eonsisting entirely of zeros, building 
a minimum-cost tree means building a minimum-height tree. It follows that the 
greedy algorithm, with minor changes, also works when the cost is the height of the 
tree. The changes are left as an exercise. 


8.2 Huffman coding trees 

Our second example is Huffman coding trees. As older computer users know only 
too well, it is often necessary to store files of informafion as compacfly as possible. 
Suppose fhe informafion fo be stored is a fexf consisfing of a sequence of characfers. 
Haskell uses Unicode infernally for ifs Char dafa type, buf the standard text I/O 
functions assume that texts are sequences of 8-bit characters, so a text of n characters 
contains 8n bits of information. Each character is represented by a fixed-length 
code, so the characters of a text can be recovered by decoding each successive group 
of eight bits. 

One idea for reducing the total number of bits required to code a text is to abandon 
the notion of fixed-length codes, and seek instead a coding scheme based on the 
relative frequency of occurrence of the characters in the text. The basic idea is to 
take a sample piece of text, estimate the number of times each character appears, 
and choose short codes for the more frequent characters and longer codes for the 
rarer ones. For example, if we take the codes 

't' —>0 
'e' —^10 
'x' —^ 11 

then “text” can be coded as the bit sequence 010110 of length 6. However, it is 
important that codes are chosen in such a way as to ensure that the coded text can 
be deciphered uniquely. To illustrate, suppose the codes had been 



Under this scheme, “text” would be coded as the sequence 01010 of length 5. 
However, the string “tee” would also be coded by 01010. Obviously this is not what 
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is wanted. The simplest way to prevent the problem arising is to choose codes so 
that no code is a proper prefix of any other - a prefix-free code. 

As well as requiring unique decipherability, we also want the coding to be optimal. 
An optimal coding scheme is one that minimises the expected length of the coded 
text. More precisely, if characters cj, for have frequencies of occurrence 

Pj, then we want to choose codes with lengths Ij such that 

is as small as possible. 

One method for constructing an optimal code satisfying the prefix property is 
called Huffman coding. Each character is stored in a leaf of a binary tree, the 
structure of which is determined by the computed frequencies. The code for a 
character c is the sequence of binary values describing the path in the tree to the 
leaf containing c. For instance, with the tree 

Node {Node {Leaf ' b') {Leaf ' e')) {Leaf ' t') 

the character ‘b’ is coded by 00, the character ‘e’ by 01, and the character ‘t’ by 1. 
Clearly, such a scheme yields a prefix-free code. 

There are four aspects to the problem of implementing Huffman coding: (i) col¬ 
lecting information from a sample; (ii) building a binary tree; (iii) coding a text; and 
(iv) decoding a bit sequence. We deal only with the problem of building a tree. 

So, having analysed the sample, suppose we are given a list of pairs: 

[(ci,Wi),(c 2 ,W 2 ),...,(c„,W„)] 

where for 1 n the Cj are the characters and the wj are positive integers, 

called weights, indicating the frequencies of the characters in the text. The relative 
frequency of character cj occurring is therefore wjjW, where W = We will 
suppose wi ^ W 2 ^ ^ Wn, so that the weights are given in ascending order. 

In terms of trees, the cost function we want to minimise can be defined in the 
following way. By definition, the depth of a leaf is the length of the path from the 
root of the tree to the leaf. We can define the list of depths of the leaves in a tree by 

depthsTree a —)■ [Nat] 
depths =from 0 

where from n {Leaf x) = [n] 

from n {Node u v) =from {n+\) u -ij-from {n-\-\) v 

Now introduce the types 

type Weight = Nat 

type Hem = {Char, Weight) 

type Cost = Nat 
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and define cost by 

cost :: Tree Elem —)■ Cost 

cost t = sum [w X <i I {{_,w),d) ■(— zip {fringe t) {depths f)] 

It is left as an exereise to derive the following alternative definition of cost: 
cost {Leaf e) =0 

cost {Node uv) = cost u + cost v + weight u + weight v 

weight :: Tree Elem —)■ Nat 

weight {Leaf (cjw)) = w 

weight {Node uv) = weight u + weight v 

We might now follow the previous seetion and speeify 

huff man :: [Elem] —)■ Tree Elem 
huff man ^ MinWith cost ■ mktrees 

where mktrees builds all the trees with a given list as fringe. But this specification 
is too strong: it is not required that the input list be the fringe, only that some 
permutation of it is. (However, in Chapter 14 we will consider a version of the 
problem in which the input is required to be the fringe.) One way of correcting the 
definition is to replace mktrees by concatMap mktrees ■ perms. Another way, and 
the one we will pursue, is to design a new version of mktrees. This version will 
construct all unordered binary trees. In an unordered binary tree the two children of 
a node are regarded as a set of two trees rather than an ordered pair. Thus Node u v is 
regarded as the same tree as Node v u. For example, there are 12 ordered binary trees 
whose fringe is a permutation of [ 1,2,3 ], two trees for each of the six permutations, 
but only three essentially different unordered trees: 

Node {Node {Leaf 1) {Leaf 2)) {Leaf 3) 

Node {Node {Leaf 1) {Leaf 3)) {Leaf 2) 

Node {Node {Leaf 2) {Leaf 3)) {Leaf 1) 

Each tree can be hipped in three ways (hipping the children of the top tree, the 
children of the left subtree, or both) to give the 12 different ordered binary trees. 
For Huffman coding it is sufficient to consider unordered trees because two sibling 
characters have the same codes except for the last bit and it does not matter which 
sibling is on the left. To compute all the unordered Huffman trees we can start with 
a list of leaves in weight order, and then repeatedly combine pairs of trees until a 
single tree remains. The pairs are chosen in all possible ways and a combined pair 
can be placed back in the list so as to maintain weight order. Thus, in an unordered 
tree Node n v we can assume cost u ^ cost v without loss of generality. 

Here is an example to see the idea at work. Showing only the weights, consider 
the following list of four trees in weight order: 
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[Leaf 3, Leaf 5, Leaf 8, Leaf 9 ] 

As a first step we can choose to combine the first and third trees (among six possible 
choices) to give 

[Leaf 5,Leaf 9,Node (Leaf 3) {Leaf 8)] 

The new tree, with weight 11, is placed last in the list to maintain weight order. As 
the next step we can choose to combine the first two trees (among three possible 
choices), giving 

[Node {Leaf 3) {Leaf S),Node {Leaf 5) {Leaf 9)] 

The next step is forced as there are only two trees left, and we end up with a singleton 
tree 

[Node {Node {Leaf 3) {Leaf 8)) {Node {Leaf 5) {Leaf 9))] 

whose fringe is [3,8,5,9]. This bottom-up method for building trees will generate 
6x3 = 18 trees in total, more than the total number of unordered trees on four 
elements, because some trees, such as the one above, are generated twice (see the 
exercises). However, the list of trees includes all that are needed. 

Now for the details. We define 

mktreesv. [Elem] —)■ [Tree Elem] 

mktrees = map unwrap ■ mkforests ■ map Leaf 

where mkforests builds the list of forests, each forest consisting of a singleton tree. 
On way to define this function uses until: 

mkforests:: [Tree Elem\ —)■ [Tore st Elem] 

mkforests = until {all single) {concatMap combine) ■ wrap 

The function mkforests takes a list of trees, turns them into a singleton list of forests 
by applying wrap, and then repeatedly combines two trees in every possible way 
until every forest is reduced to a single tree. Each singleton forest is then unwrapped 
to give the final list of trees. The function combine is defined by 

combine:: Eorest Elem —)■ [Eorest Elem] 

combine ts = [insert {Node t\ tf) us \ {{t\f 2 )-,us) ^pairs ts] 

pairs:: [a] [((a,a), [a])] 

pairsxs = [((x,y),z5) | {x,ys) •(— picksxs, (yjZ.s') ■(— picksys] 

The function picks was defined in Chapter 1. The function insert, whose definition 
is left as an exercise, inserts a tree into a list of trees so as to maintain weight order. 
Hence combine selects, in all possible ways, a pair of trees from a forest, combines 
them into a new tree, and inserts the new tree into the remaining trees. 

Another way to define mkforests uses the function apply. Recall the answer to 
Question 1.13, which gives the following definition of apply: 
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apply "Nat {a^ a) ^ a^ a 

apply nf = ifn == 0 then id else/ • apply {n — l)f 

Thus apply n applies a function n times to a given value. The alternative definition 
of mkforests is to write 

mkforestsv. [TreeElem] —)■ [Forest Elem] 

mkforests ts = apply {length — 1) {concatMap combine) [fi] 

The two definitions give the same result because at each step the number of trees 
in each forest is reduced by one, so it takes exactly n — \ steps to reduce an initial 
forest of n trees to a list of singleton forests. 

Our problem now takes the form 

huffinan :: [Elem] —)■ Tree Elem 
Huffman ■(— MinWith cost ■ mktrees 

Since mktrees is defined in terms of until, we will aim for a constructive definition 
of Huffman of the same form. The task is to find a function gstep so that 

unwrap {until single gstep {map Leaf xs )) ■(— MinWith cost {mktrees xs) 

for all finite nonempty lists xs of type [Elem], More generally, we will seek a 
function gstep such that 

unwrap {until single gstep ts) t— MinWith cost {map unwrap {mkforests ts)) 

for all finite nonempty lists of trees ts. Problems of this form will arise in the 
following chapter too, so let us pause for a little more theory on greedy algorithms. 


Another generic greedy algorithm 

Suppose in this section that the list of candidates is given by a function 
candidates :: State —)■ [Candidate] 

for some type State. For Huffman coding, states are lists of trees and candidates are 
trees: 

candidates ts = map unwrap {mkforests ts) 

For the problems in the following chapter, states are combinations of values. 

The aim of this section is to give conditions for which the refinement 

extract {until final gstep sx) ■(— MinWith cost {candidates sx) (8.2) 

holds for all states sx. The functions on the left have the following types: 

gstep :: State —)■ State 
final :: State —s- Bool 
extract :: State —)■ Candidate 

In words, (8.2) states that repeatedly applying a greedy step to any initial state sx will 
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result in a final state from which a candidate x can be extracted with the property that 
X is a candidate in candidates sx with minimum cost. In order for the refinement to 
be meaningful, it is assumed that the left-hand side returns a well-defined value for 
any initial state. Unlike the formulation of a generic greedy algorithm in Section 7.1, 
nothing is known about how the candidates are constructed. 

For brevity in what follows, define 

MCC sx = MinWith cost {candidates sx) 
mincost sx = minimum {map cost {candidates sx)) 

In particular, for all x in candidates sx we have 
X •(— MCC sx cost X = mincost sx 
There are two conditions that ensure (8.2). The first is 

final sx ^ extract sx ^ MCC sx (8.3) 

This condition holds for Huffman coding, final = single and extract = unwrap, 
since map unwrap {mkforests [t]) = [t] and MinWith cost [t] = t. 

The second condition is the greedy condition. We can state it in two ways. The 
first way is 

not (finalsx) ^ (3x:x MCC {gstep sx) Ax MCC sx) (8.4) 

In hillclimbing terms, the greedy condition asserts that, from any starting point not 
already on top of the hill, there is some path to a highest point that starts out with a 
greedy step. 

The second way of stating the greedy condition appears to be stronger: 

not (final sx) ^ MCC (gstep sx) ^ MCC sx (8.5) 

However, with one extra proviso, (8.4) implies (8.5). The proviso is that applying 
gstep to a state may reduce the number of final candidates but will never introduce 
new ones. In symbols, 

candidates {gstep sx) C candidates sx (8.6) 

Suppose X ■(— MCC {gstep sx) and x MCC sx. Then, by definition of MCC and 
mincost, we have 

mincost {gstep sx) = cost x = mincost sx 
Now suppose y A- MCC {gstep sx), so y G candidates sx by (8.6). Then 
cost y = mincost {gstep sx) = mincost sx 

and so y ■(— MCC sx. Although, in general, E\ £"2 is a stronger statement than one 
that merely asserts there exists some value v such that v ■(— £1 A v •(— £ 2 , that is not 
the case here. 

To prove (8.2), suppose that k is the smallest integer - assumed to exist - for 
which apply k gstep sx is a final slate. That means 
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until final gstep sx = apply k gstep sx 

It follows that apply j gstep sx is not a final state for 0 <k, so, by the stronger 

greedy eondition, we have 

MCC {apply (/+ 1) gstep sx) t— MCC {apply j gstep sx) 

for 0 <k. Hence MCC {apply k gstep sx) t— MCC sx. Furthermore, by (8.3) we 

have 

extract {apply k gstep sx) t— MCC {apply k gstep sx) 
establishing (8.2). 

This style of reasoning about greedy algorithms is very general. However, unlike 
greedy algorithms derived by fusion, it gives no hint as to what form gstep might 
take. 


Huffman coding continued 

Returning to Huffman coding, in which candidates are trees, it remains to define 
gstep and to show that the greedy condition holds. For Huffman coding we have 

MCC ts = MinWith cost {map unwrap {mkforests ts)) 

We take gstep to be the function that combines the two trees in the forest with 
smallest weights. Since trees are kept in weight order, that means 

gstep {ti : t 2 : ts) = insert {Node t\ t 2 ) ts 

For the greedy condition, let = [ti,t 2 , ...,tn] be a list of trees in weight order, with 
weights [h'i,W 2 , ...,w„]. The task is to construct a tree t for which 

t ■(— MCC {gstep ts) A t ■(— MCC ts 

Suppose t' t— MCC ts. We construct t by applying tree surgery to t'. Every tree in 
ts appears somewhere as a subtree of t', so imagine that t, appears at depth J, in t' 
for 1 ^ ^ n. Now, among the subtrees of t', there will be a pair of sibling trees at 
greatest depth. There may be more than one such pair, but there will be at least one. 
Suppose two such trees are t, and tj and let d = di = dj. Then di C,d and d 2 ^ d. 
Furthermore, t, and tj could have been chosen as the first step in the construction 
of tf Without loss of generality, suppose wi ^ w, and W 2 ^ wj. Construct t by 
swapping t, with t\ and tj with t 2 . Then t can be constructed by taking a greedy first 
step. Furthermore 

cost t' — cost t = d\W\ + d2W2 + d{wi + Wj) — {d\ Wi + d2Wj + d{w\ + W2)) 

= {d - d\) {wi- w\) + {d - d 2 ) {wj - W 2 ) 

^0 

But cost t' is as small as possible, so cost t' = cost t. Hence t t— MCC ts and 
1 1 — MCC {gstep ts). 
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The same tree surgery can be used to show that the stronger greedy condition 
holds by a direct argument. Suppose t ^ MCC {gstep ts) but t is not a value in 
MCC ts. That means there exists a tree t' MCC ts with cost t' < cost t. We now 
get a contradiction by applying the surgical procedure to t' to produce another tree 
f ^ MCC {gstep ts) with cost t = cost t" ^ cost t'. 

Here is the greedy algorithm we have derived: 

hujfman es = unwrap {until single gstep {map Leaf es)) 

where gstep {t \: t 2 '■ ts) = insert {Node ti t 2 ) ts 
However, simple as it is, the algorithm is not quite ready to leave the kitchen. There 
are two sources of inefficiency. Firstly, the function insert recomputes weights at 
each step, an inefficiency that can easily be brushed aside by tupling. The more 
serious issue is that, while finding two trees of smallest weights is a constant-time 
operation, inserting the combined tree back into the forest can take linear time in 
the worst case. That means the greedy algorithm takes quadratic time in the worst 
case. The final step is to show how this can be reduced to linear time. 

The key observation behind the linear-time algorithm is the fact that, in any 
call of gstep, the argument to insert has a weight at least as large as any previous 
argument. Suppose we combine two trees with weights wi and W 2 and, later on, 
two trees with weights W 3 and W4. We have wi ^ W 2 ^ W 3 ^ W 4 , and it follows 
that wi -1-^2 ^ W3 + W4. This suggests maintaining the non-leaf trees as a simple 
queue, whereby elements are added to the rear of the queue and removed only from 
the front. Instead of maintaining a single list we therefore maintain two lists, the 
first being a list of leaves and the second a queue of node trees. Since elements are 
never added to the first list, but only removed from the front, the first list could 
also be a queue. But a simple list suffices. We will call the first list a stack simply 
to distinguish it from the second one. At each step, gstep selects two lightest trees 
from either the stack or the queue, combines them, and adds the result to the end 
of the queue. At the end of the algorithm the queue will contain a single tree, the 
greedy solution. Figure 8.2, which shows the weights only, gives an example of how 
the method works out. The method is viable only if the various queue operations 
take constant time. But we have already met symmetric lists in Chapter 3, which 
satisfy the requirements exactly. 

Here are the details. First we set up the type SQ of Stack-Queues: 

type SQ a = {Stack a, Queue a) 
type Stack a =[a] 
type Queue a = SymList a 
Now we can define 

Huffman w \Elem\ —)■ Tree Elem 

Huffman = extractSQ ■ until singleSQ gstep ■ makeSQ ■ map leaf 
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1, 2, 4, 4, 6, 9 

4, 4, 6, 9 1+2 

4,6,9 4+(1+2) 

9 4+(l+2),4 + 6 

4 + 6,9 + (4 + (l+2)) 

(4 + 6) + (9 + (4+(l+2))) 


Figure 8.2 Example of the stack and queue operations 


The component functions on the right-hand side are defined in terms of the type 
type Pair = {Tree Elem, Weight) 

of pairs of trees and weights. First of all, the functions leaf and node (needed in the 
definition of gstep) are smart constructors that install weight information correctly: 

leaf :: Elem —s- Pair 

leaf (c,w) = {Leaf {c,w)^w) 

node :: Pair —)■ Pair —)■ Pair 

node {t\^w\) {t 2 -,W 2 ) = {Node t\ t 2 ,wi +W 2 ) 

Next, the function makeSQ initialises a Stack-Queue: 

makeSQ :: [Pair] —)■ SQ Pair 
makeSQxs = {xs,nilSL) 

Recall that the function nilSL returns an empty symmetric list. 

Next, the function singleSQ determines whether a Stack-Queue is a singleton, 
and extract SQ extracts the tree: 

singleSQ :: SQ a —)■ Bool 

singleSQ {xs,ys) = null xs A singleSL ys 

extractSQ :: SQ Pair —)■ Tree Elem 
extractSQ {xs,ys) =fst {headSLys) 

The function singleSL, whose definition is left as an exercise, tests for whether a 
symmetric list is a singleton. 

Finally, we define 

gstep :: SQ Pair —)■ SQ Pair 
gstep ps = add {nodeP 1 P 2 ) rs 

where {p\,qs) = extractMinps 
{p 2 , rs) = extractMin qs 

add :: Pair SQ Pair SQ Pair 
addy (x5,y5) = {xs,snocSLy ys) 
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It remains to define extractMin for extracting a tree with minimum weight from a 
Stack-Queue: 

extractMin :: SQ Pair —)■ {Pair, SQ Pair) 
extractMin 

\nullSLys = {head xs, {tail xs,ys)) 

I null xs = {headSL ys, {xs, tailSL ys)) 

I sndX ^ sndy = {x, {tailxs,ys)) 

I otherwise = {y, {xs, tailSL ys)) 
where x = head xs', y = headSL ys 

If both the stack and the queue are nonempty, then the tree with the smallest weight 
from either list is selected. If one of the stack and the queue is empty, the selection 
is made from the other component. 

The linear-time algorithm for Huffman coding depends on the assumption that 
the input is sorted into ascending order of weight. If this were not the case, then 
0{n log n) steps have to he spent sorting. Strictly speaking, that means Huffman 
coding actually takes 0{n log n) steps. There is an alternative implementation of the 
algorithm with this running time, and that is to use a priority queue. Priority queues 
will he needed again, particularly in Part Six, so we will consider them now. 


8.3 Priority queues 

A priority queue is a data structure PQ for maintaining a list of values so that the 
following two operations take at most logarithmic time in the length of the list: 

insertQ :: Ordp ^ a ^p ^ PQ ap ^ PQ ap 
deleteQ :: Ordp ^ PQ ap ^ {{a,p),PQ ap) 

The function insertQ takes a value and a priority and inserts the value into the queue 
with the given priority. The function deleteQ takes a nonempty queue and extracts a 
value whose priority is the smallest, returning the value and its associated priority, 
together with the remaining queue. In a max-priority queue the function deleteQ 
would extract a value with the largest priority. 

As well as the two functions above, we also need some other functions on priority 
queues, including 

emptyQ y.PQap 

nullQ y.PQap^ Bool 

addListQy. Ordp [{a,p)] —)■ PQ ap ^ PQ ap 

toListQ y.Ordp ^ PQ ap ^ [{a,p)] 

The constant emptyQ represents an empty queue, and nullQ tests for an empty 
queue. The function addListQ adds a list of value-priority pairs in one fell swoop, 
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while toListQ returns a list of value-priority pairs in order of priority. The funetion 
addListQ ean be defined in terms of insertQ (see the exereises). 

One simple implementation of priority queues is to maintain the queue as a list in 
ascending order of priority. But, as we have seen with Huffman’s algorithm, this 
means that insertion is a linear-time operation. A better method is to use a heap, 
similar to the heaps described in Section 5.3. Using a heap guarantees logarithmic 
time for an insertion or deletion, as we will now see. 

The relevant data type for heaps is the following: 

data PQ a p = Null I Fork Rank a p {PQ a p) {PQ a p) 
type Rank = Nat 

A queue is therefore a binary tree. (We use Fork as a constructor rather than Node 
to avoid a name clash with Huffman trees, but continue to refer to ‘nodes’ rather 
than ‘forks’.) The heap condition is that flattening a queue returns a list of elements 
in ascending order of priority: 

toListQ:: Ord p ^ PQ ap ^ [{a,p)] 
toListQ Null = [] 

toListQ {Fork ^ xp ti t 2 ) = {x,p): mergeOn snd {toListQ t\) {toListQ t 2 ) 

The definition of mergeOn is left as an exercise. Thus a queue is a heap in which 
the element at a node has a priority that is no larger than the priorities in each of 
its subtrees. Each node of the queue stores an additional piece of information, the 
rank of that node. By definition, the rank of a tree is the length of the shortest path 
in the tree from the root to a Null tree. A queue is not just a heap but a variety 
called a leftist heap. A tree is leftist if the rank of the left subtree of any node is no 
smaller than the rank of its right subtree. This property makes heaps taller on the 
left, whence its name. A simple consequence of the leftist property is that the length 
of the shortest path from the root of a tree to a Null is always along the right spine 
of the tree. We leave it as an exercise to show that, for a tree of size n, this length is 
at most [log(n-|-l)J. 

We can maintain rank information with the help of a smart constructor/ork: 

fork ::a^ p ^ PQ ap ^ PQ ap ^ PQ a p 
fork xp ti t 2 

I ?'2 ^ = Fork {r 2 + l) xp ti t 2 

I otherwise = Fork {r\ + 1) xp t 2 t\ 
where ri = rank ti; r 2 = rank t 2 
rank:: PQ ap ^ Rank 
rank Null = 0 

rank {Fork r _)=r 
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In order to maintain the leftist property, the two subtrees are swapped if the left 
subtree has lower rank than the right subtree. 

Two leftist heaps ean be eombined into one by the funetion combineQ, where 

combineQ:: Ordp ^ PQ ap ^ PQ ap ^ PQ ap 
combineQ Null t = t 
combineQ t Null = t 

combineQ {Fork ki xi pi l\ r\) {Fork k 2 X 2 P 2 h ^ 2 ) 

\pi ^Pi =forkxi Pi h {combineQ n {Fork k 2 X 2 P 2 h ^ 2 )) 

I otherwise =forkx 2 P 2 h {combineQ {Fork k\ xi pi h rQ r 2 ) 

In the worst case, combineQ traverses the right spines of the two trees. Hence the 
running time of combineQ on two leftist heaps of rank at most r is 0(log r) steps. 
Now we can define the insertion and deletion operations (the functions emptyQ and 
nullQ are left as exercises): 

insertQ :: Ordp ^ a ^p ^ PQ ap ^ PQ ap 
insertQ xp t = combineQ (fork xp Null Null) t 

deleteQ :: Ordp ^ PQ ap ^ {{a,p),PQ ap) 
deleteQ {Fork _ xp t\ 12 ) = {{x,p),combineQ ti 12 ) 

Both operations take logarithmic time in the size of the queue. Summarising, by 
using a priority queue of n elements rather than an ordered list we can reduce the 
time for an insertion to C?(log n) steps rather than 0{n) steps. The price paid for 
this reduction is that the time to find a smallest value goes up from 0(1) steps to 
0 (log n) steps. 

Finally, here is the implementation of Huffman’s algorithm using a priority queue: 
Huffman v. \Elem\ —)■ Tree Elem 

Huffman = extract ■ until singleQ gstep ■ makeQ ■ map leaf 

extract ::PQ {Tree Elem) Weight —)■ Tree Elem 
extract =fst -fst ■ deleteQ 

gstep ::PQ {Tree Elem) Weight —)■ PQ {Tree Elem) Int 
gstep ps = insertQ twrs 

where {t,w) = node pi p 2 

(p\,qs) = deleteQ ps 
{p 2 ,rs) = deleteQ qs 

makeQ'.: Ordp ^ [{a,p) \ —)■ PQ ap 
makeQ xs = addListQ xs emptyQ 

singleQ:: Ordp ^ PQ ap ^ Bool 
singleQ = nullQ ■ snd ■ deleteQ 
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This algorithm runs in 0{n log n) steps without making the assumption that the 
input is sorted hy weight. 


8.4 Chapter notes 

The minimum-cost tree problem was first described in [1]. Another way to build 
a minimum-cost tree is to use either the Hu-Tucker [2] or the Garsia-Wachs algo¬ 
rithm [5]. The Hu-Tucker algorithm applies because cost is a regular cost function 
as defined in [2]. But the best implementation of the Hu-Tucker algorithm takes 
&{n log n) steps. The Garsia-Wachs algorithm will be discussed in Section 14.6. 

Huffman’s algorithm is a firm favourite in the study of greedy algorithms. It first 
appeared in [3]. The linear-time greedy algorithm based on queues is described 
in [4], which also shows how the algorithm can be generalised to deal with k-ary 
trees rather than just binary trees. If one insists that the fringe of the tree is exactly 
the given character-weight pairs in the order they are given, then the resulting tree, 
called an alphabetic tree by Hu, can be built using the Garsia-Wachs algorithm. 

There are many implementations of priority queues, including leftist heaps, skew 
heaps, and maxiphobic heaps. All these can be found in [6, 7]. 
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Exercises 

Exercise 8.1 Consider the recurrence //(1) = 0 and H{n) = l+H{\n/2]. Prove by 
induction that H{n) = [log ti\. 

Exercise 8.2 Prove that the bottom-up algorithm 

mktree = unwrap ■ until single (pairWith Node) ■ map Leaf 
of Section 8.1 produces a tree of minimum height. 
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Exercise 8.3 We claimed in Section 8.1 that minimising Icost also minimises cost. 
Why is this true? 

Exercise 8.4 Why is the claim rollup ■ spine = id not true for all possible lists of 
trees? 

Exercise 8.5 The (context-free) fusion rule for foldrn asserts that 
foldmf 2 g 2 xs ^ M (foldmfi g\ xs) 
for all finite lists xs, provided 

g2X (gi x) 

f 2 x{My) ^M{f\_xy) 

Prove this result. 


Exercise 8.6 Specialise the final greedy algorithm of Section 8.1 as suggested to 
build a minimum-height tree. 

Exercise 8.7 The function splits:: [a] [([fl^], [fl^])] splits a list xs into all pairs 

of lists {ys,zs) such that xs = -H-z^. The function splitsn is similar, except that 

it splits a list into pairs of nonempty lists. Give recursive definitions of splits and 
splitsn. 


Exercise 8.8 Using splitsn, give a recursive definition of the function mktrees of 
Section 8.1. Write down a recurrence relation for the function T{n) that counts the 
number of trees with n leaves. It can be shown that 
1 


n—\ 

These values are called the Catalan numbers. 




Exercise 8.9 Here is another way of defining the function mktrees of Section 8.1, 
one similar to that used in Huffman coding: 

mktrees:: [a] —)■ [Tree a] 

mktrees = map unwrap ■ until {all single) (concatMap combine) ■ 
wrap ■ map Leaf 

combine'.‘.Forest a —)■ [Forest a] 

combine xs = [y^-H- [Nodexy] -H-z^ | {ys,x:y:zs) •(— splits xs] 

The function combine combines two adjacent trees in a forest in all possible ways. 
The process is repeated until only singleton forests remain, forests that consist of 
just one tree. Finally the trees are extracted to give a list of trees. This method 
may generate the same tree more than once, but all possible trees are nevertheless 
produced. Write down the associated greedy algorithm for this version of mktrees 
(no justification is required). 
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Exercise 8.10 In Huffman coding, why does the second, recursive definition of 
cost follow from the first? 

Exercise 8.11 Define the function insert used in Huffman’s algorithm. 

Exercise 8.12 Give the two ways that the tree 

[Node {Node {Leaf 3) {Leaf 8)) {Node {Leaf 5) {Leaf 9))] 
can he generated from [Leaf 3,Leaf 5,Leaf ^,Leaf 9]. 

Exercise 8.13 The number of trees generated in the specification of Huffman’s 
algorithm is given for n ^ 2hy 



Show that this number equals 
n\{n — \)\ 

2«-i 

Exercise 8.14 Define MCC kxs = MinWith cost {apply kfstep [x^]). Show that 
apply k gstep xs ■(— MCC k xs 
provided MCC k {gstep xs) MCC (^ + 1) xs. 

Exercise 8.15 Define the function singleSL:: SymList a —)■ Bool for determining 
whether a symmetric list is a singleton. 

Exercise 8.16 Define addListQ in terms of insertQ. 

Exercise 8.17 Define mergeOn. 

Exercise 8.18 Show that, for the trees considered in Section 8.3, a tree of size n 
has rank at most [log (n + 1)J. 

Exercise 8.19 Define emptyQ and nullQ. 

Answers 

Answer 8.1 The base case is immediate and the induction step follows from 
[logn] = 1+ [log[n/2]] 

This equation can be proved by showing 
[logn]^k 1 + [log [n/2]] ^ k 

for any k. Both sides reduce to n ^ 2^ by appeal to the rule of ceilings, establishing 
the result. 
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Answer 8.2 For a list of length n the bottom-up algorithm builds a tree t whose left 
child is a perfectly balanced binary tree with 2 ^ leaves, where 2^+1. The 

height of t is therefore k+\ = [log n\ , the smallest height possible. 

Answer 8.3 Because cost = head ■ least and 
us ^vs head us ^ head V5 

Answer 8.4 The function spine returns the undefined value on trees with an infinite 
spine, so the equation fails. 

Answer 8.5 The base case is easy, and the induction step is 

foldmf 2 g 2 {x:xs) 

= { definition of foldm } 

f2 X (foldm f2 g2 xs) 

<— { induction } 

/2 X (M (foldm fI gi xi)) 

{ fusion condition } 

M (fix (foldm fI gi xi)) 

= { definition of foldm } 

M (foldmf\ gi (x:xs)) 

Answer 8.6 The algorithm is 

greedy = rollup ■ mapfst-foldm insert (wrap ■ leaf) 
where insert xts = leaf x: join ts 
join [u\ = [u\ 

join (u'.v.ts) = if snd u < snd v then w.v.ts else join (node uv.ts) 
leaf X = (Leaf x, 0) 

Answer 8.7 The definitions are 
splits [] = [([],[])] 

splits (x'.xs) = ([],x:x5): [(x:y5,z5 ) | (yi',z.s') •(— splits xs] 

splitsn [] = [] 

splitsn [x] = [] 

splitsn (x'.xs) = ([x],x5): [(x:y5,z5 ) | (y.s',zi') ^ splitsn xs] 

Answer 8.8 We have 

mktrees [x] = [Leaf x] 

mktrees xs = [Node u v [ (ys,zs) ■(— splitsn xs, 

u ^ mktrees ys,v mktrees zs] 

The recurrence relation is given by r(l) = 1 and, for n > 1, 
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n—1 


T{n)=Y^T{k) T{n-k) 

k=\ 


Answer 8.9 The greedy algorithm is 
mctw [Nat] —)■ Tree Nat 

met = unwrap ■ until single combine ■ map Leaf 

combine "Forest Nat —)■ Forest Nat 
combine ts = us Fr [Node uv]-\^vs 

where {us,u:v:vs) = bestjoin ts 

The omitted function bestjoin splits a forest into two sub-forests in which the 
first two trees of the second forest are trees whose combined cost is minimal. For 
example, for the input [5,3,1,4,2,2] this version of met produces the tree 



In the case of a minimum-height tree this greedy algorithm can be simplified to give 
the bottom-up algorithm described at the beginning of the chapter. 

Answer 8.10 If the cost of tree u is ^ w, Z, and the cost of tree v is Z[, then the 
cost of Node uvis 

L (^i + 1) + L + 1) = “ + (^ost V + weight u + weight v 

Answer 8.11 We can implement insert by linear search, leading to 

insert:: Tree Elem —)■ Forest Elem —)■ Eorest Elem 
insert ti [] = [Zi] 

insert t\ {t 2 '■ ts) = if weight t\ ^ weight t 2 then t\\t 2 \ ts else t 2 : insert t\ ts 

Answer 8.12 The same tree is generated either by combining Leaf 3 and Leaf 8 as 
a first step, followed by combining Leaf 5 and Leaf 9, or vice versa. 

Answer 8.13 The proof is by induction. Both expressions equal 1 for n = 2, and 
the induction step is an easy calculation. 



204 


Greedy algorithms on trees 


Answer 8.14 The proof is by induction. The case ^ = 0 is immediate, and for the 
induction step we can argue as follows: 

apply (^ + 1) gstep xs 
= { definition of apply } 

apply k gstep {gstep xs) 
t— { induction } 

MCC k {gstep xs) 

^ { given } 

MCC {k + \ )xs 

Answer 8.15 The definition is 
singleSL :: SymList a —)■ Bool 

singleSL {xs,ys) = {null xs A single ys) V {null ys A single xs) 

Answer 8.16 We can define 

addListQ xs q =foldr {uncurry insertQ) qxs 

Answer 8.17 We have 

mergeOn:: Ord {a^ b) ^ [a]^ [a] —)■ [a] 
mergeOn key xs[\=xs 
mergeOn key [ ] = y>s' 

mergeOn key {x : xs) {y : ys) 

I key X ^ key y = v: mergeOn key xs {y : ys) 

I otherwise = y : mergeOn key {x : xs) ys 

Answer 8.18 Essentially we have fo show fhaf a free of rank r and size n satisfies 
2'' — 1 ^ n. Such a free has one node af depfh 0, fwo nodes af depfh 1, and so on. 
The free also has 2'^^^ nodes af depfh r — 1. All fhis is because fhe firsl null node 
does nof appear unfil level r. Hence fhe size of fhe free is af leasf 
l +2 + --- + 2 '-i = 2 ''-! 
which esfablishes fhe claim. 

Answer 8.19 We have 
emptyQ = Null 
nullQ Null = True 
nullQ _ = False 
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In this chapter we consider two problems for which the candidates are graphs, in fact 
special forms of graph called spanning trees. The first problem is about computing 
a spanning tree with minimum cost for a connected graph, while the second is about 
computing a spanning tree for a directed graph whose edges determine a shortest 
path from a given starting vertex to all other vertices. All these terms are made 
precise below. The shortest-paths algorithm is then used to solve another problem, 
called the. jogger’s problem, for computing a cyclic path with minimum total cost. 


9.1 Graphs and spanning trees 

We start with some terminology. There are two kinds of graph, a directed graph, also 
called a digraph, and an undirected graph, just called a graph. Certain definitions 
are slightly different for digraphs and graphs, so it is best to consider them as 
separate though closely related species. The minimum-cost spanning tree problem 
deals with graphs, while the shortest-paths problem deals with digraphs. 

By definition, a digraph D is a pair {V,E), where T is a set of vertices, also called 
nodes, and £" is a set of edges. An edge consists of a pair of vertices («, v), where u 
is the source of the edge and v is the target. Such an edge is directed from u to v. In 
a digraph it is possible to have loops - edges of the form {u, u). Because £" is a set it 
cannot contain an edge more than once, so there is at most one edge with the same 
source and target. It follows that a digraph of n vertices cannot have more that n^ 
edges, or n (n — 1) edges if there are no loops. 

A graph G is also given by a pair {V,E) of vertices and edges, but this time each 
edge is a set {m,v} of exactly two vertices. It follows that graphs cannot have loops. 
For the purposes of representation we write («, v) for this set of two vertices, but 
(m,v) and (v,m) are considered to be the same edge. A graph of n vertices cannot 
have more than n{n — l)/2 edges. In a sparse graph or digraph with n vertices and 
e edges we have e = 0{n), while in a dense graph or digraph we have e = Q.{n^). 
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Certain algorithms are better tailored for sparse graphs, while others are better for 
dense graphs. 

For the purposes of this chapter we will need labelled graphs and digraphs. 
By definition, a labelled graph or digraph is a graph in which each edge carries a 
label, usually called its weight. For simplicity, weights are assumed to be integers. 
The weight of an edge is recorded along with the edge, so both for graphs and for 
digraphs the relevant type declarations are: 

type Graph = {[Vertex], [Edge]) 
type Edge = {Vertex, Vertex, Weight) 
type Vertex = Int 
type Weight = Int 

We will also fix on the definitions 

nodes {vs, es) = V5 
edges {vs, es) = es 
source {u,v,w) = u 
target {u,v,w) =v 
weight {u,v,w) = w 

The representation of a graph as lists of vertices and edges mirrors the mathematical 
definition and is acceptable for many problems. For other problems an alternative 
representation is superior. This is to view a graph as an adjacency function of type 
Vertex —)■ [{Vertex, Weight)]. The domain of this function is the set of vertices, and 
for each vertex u the value of the function applied to u is a set of pairs {v,w) such 
that {u,v,w) is a labelled edge. Assuming that vertices are named by integers in the 
range 1 to n for some n, a simple implementation of the adjacency function is by an 
array, so an alternative description of a graph is 

type AdJArray = Array Vertex [{Vertex, Weight)] 

We leave it as an exercise to convert between the two descriptions. The adjacency 
array representation of a graph is used for some of the problems in Part Six. 

A path in a graph or digraph is a sequence [vq,vi, ...,Vk] of vertices such that 
{vj,Vj+\) is an edge (directed from vj to vy+i in the case of digraphs) for 0 <k. 

Such a path connects vq and Vk. A cycle in a graph or digraph is a path [vq, vi,..., vq] 
whose edges and vertices are all distinct apart from the two endpoints. In a graph 
the path [vo,vi,vo] is not a cycle because (vo,vi) and (vi,vo) are the same edge; 
consequently, cycles in a graph have lengths greater than two. A graph or digraph 
is acyclic if there are no cycles; a graph is connected if there is a path from every 
vertex to every other vertex. In an acyclic graph there can be at most one path 
between any two vertices. 

A connected acyclic graph is called a tree, and a set of trees is called a forest. 
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type Tree = Graph 
type Forest = [Tree] 

Trees are therefore not a data type in the sense used in previous chapters, but merely 
a synonym for a special kind of graph. Every graph can be decomposed into the set 
of its connected components. 

A spanning forest of a graph G = {V,E) is a disjoint set of trees 
{VuE,),{V2,E2),...,{Vt,Ek) 

with V = whose combined edges E' = constitute a maximal 

subset of E in the sense that no further edge of G can be added to E' without creating 
a cycle. If G is connected, then a spanning forest consists of a single spanning 
tree. A spanning tree of a connected graph with n vertices has exactly n — 1 edges 
(why?). Finally, a minimum-cost spanning tree (MCST) of a connected graph G 
is a spanning tree T of G in which the sum of the weights of the edges in T is as 
small as possible. Our aim in this section is to find efficient methods for computing 
a MCST of a connected graph. It is left as an exercise to generalise the solutions to 
compute a minimum-cost spanning forest of a graph that is not connected. 

To add some life to these definitions, consider a country with a given network of 
towns and roads. The towns are the vertices and each road is an edge connecting 
two towns. Each road can be travelled in either direction, so the graph is undirected. 
We suppose there is at most one road connecting two given towns and the weight 
associated with a road is its length. The network may be connected in that there is a 
route (a path) between every two towns, but it may not. For instance, the country may 
consist of several islands not connected by bridges. If all tbe towns are connected 
by road, then a MCST is a network of roads with no cyclic routes' and of minimum 
total cost connecting all the towns. 

Finding a MCST does not help with planning shortest routes. The path between 
two towns in a MCST is not necessarily tbe shortest route between tbe two towns. 
We will consider the problem of planning shortest routes in Section 9.5. Finding a 
MCST also does not help with a superficially similar problem in which one is given a 
connected graph and a subsef T of verfices, wifh fhe aim of finding a minimum-cost 
tree that includes every vertex in T. This problem, called tbe Steiner tree problem, 
is much more challenging and beyond the scope of this book. 

Here is a specification of the MCST problem, expressed in our standard way: 

mcst :: Graph —)■ Tree 
mcst t— MinWith cost ■ spats 

The function cost returns the sum of the weights of the edges of the tree: 

* Though routes for cyclists are certainly allowed. 
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cost:: Tree —)■ Ini 

cost = sum ■ map weight ■ edges 

The function spats (short for ‘spanning trees’) generates all the spanning trees of a 
given connected graph. For a graph {V,E) with n vertices, that means finding all 
subsets of E of size n — \ which are both acyclic and connected. One way to define 
spats is to add edges one by one into an initially empty set, ensuring at each step 
that the set of edges is acyclic, that is, a forest. Only at the final step when the last 
edge is added is the forest guaranteed to coalesce into a single tree (provided of 
course that the graph is connected). Another way is also to add edges one by one, 
but to ensure at each step that the set of edges is both acyclic and connected, that is, 
a tree. These two methods for generating spanning trees lead to two different greedy 
algorithms, known respectively as Kruskal’s algorithm and Prim’s algorithm. Let us 
examine each in turn. 


9.2 Kruskal’s algorithm 

In Kruskal’s algorithm, the definition of spats is very similar to the definition of 
mktrees in Huffman’s algorithm, except that it works on a list of states rather than 
a list of trees. Each state is a pair consisting of a forest and the list of edges from 
which the next edge can be chosen: 

type State = {Eorest, [Edge]) 
spats :: Graph [Tree] 

spats = map extract ■ until {all done) {concatMap steps) ■ wrap ■ start 

extract :: State —)■ Tree 
extract ([?],-) = t 

done :: State —)■ Bool 
done = single -fst 

start :: Graph —)■ State 

start g = ([([v], []) I V ■(— nodes g],edges g) 

The starting state consists of a forest of trees, each of which is a graph with a 
single vertex and no edges, and the full set of edges of the graph. A final state is a 
pair consisting of a singleton tree and the list of edges not used in its construction. 
The function extract takes a final state, discards the unused edges, and extracts the 
spanning tree. 

That leaves us with the definition of steps w State —)■ [State], which takes a forest 
and a list of edges and selects every possible edge that can be added to the forest 
without creating a cycle. An edge can be so added if its endpoints belong to different 
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trees. The result is a forest in which the two trees are combined into one larger tree. 
Hence we define 

steps:: State —)■ [State] 

steps {ts,es) = [{add e ts,es') \ {e,es') ■(— picks es,safeEdge e ts] 

Recall that the function picks :: [a] ^ [(a, [a])] picks an element of a nonempty list 
in all possible ways, returning both the element and the remaining list. The function 
safeEdge is defined by 

safeEdge:: Edge —)■ Eorest —)■ Bool 

safeEdge e ts =find ts {source e) f^find ts {target e) 

find:: Eorest —)■ Vertex —)■ Tree 

find tsv = head [t\t ^ ts,any {== v) {nodes t)] 

The value find ts v is the unique tree in the forest ts that contains v as one of its 
vertices. Each find operation can take &{n) steps in the worst case, since every vertex 
of the graph may have to be inspected. A more efficient definition is given later on. 
Finally, the function add combines two trees and adds the result to the forest: 

add:: Edge —)■ Eorest —)■ Eorest 

add e ts = {nodes t\ -H- nodes t2,e: edges ti 4f edges t2): rest 
where fi = find ts {source e) 
h =find ts {target e) 
rest = [t I t ■(— A, f / ti A t / f2] 

It follows that each add operation, Eke find, takes 0 {n) steps (ignoring tree compar¬ 
isons). Again, a more efficient definition is given later on. 

The greedy algorithm for computing a minimum-cost spanning tree is obtained 
by following the path mapped out by the theory in the previous chapter. First, define 

MCC = MinWith cost ■ map extract ■ until {all done) {concatMap steps) ■ wrap 
Recall fhaf we have 

extract {until done gstep sx) ^ MCC sx 
for all sfafes sx, provided fwo condifions are salisfied. The firsl one is 
done sx ^ extract sx ^ MCC sx 
This condifion follows from fhe definilion of MCC, since 
extract {[t],es) = f MCC {[t],es) 

The second condifion is fhe greedy condifion: fhere exisfs a free t such fhaf 
t ^ MCC {gstep sx) A f ■(— MCC sx 

To verify fhe greedy condifion we have fo choose a definifion of gstep. The obvious 
choice is fo define gstep fo selecf a safe edge of minimum weighf. Assuming fhe lisf 
of edges is in ascending order of weighf, we can define gstep by 
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gstep:: State —)■ State 

gstep {ts,e'. es) = if / t 2 then {ts',es) else gstep {ts,es) 
where ti =find ts {source e) 
h =find ts {target e) 

ts' = {nodes t\ +|- nodes t 2 ,e: edges ti +|- edges 12 ): rest 
rest = [t \ t ts,t t\ /\ t ^ t 2 \ 

The function gstep selects the first edge whose endpoints are in different trees t\ 
and t 2 , and comhines ti and t 2 into one tree. 

Now, to verify the greedy condition, consider a state sx = {ts,es) consisting of a 
forest ts and a list es of unused edges. Let e he an element of es of lightest weight that 
is a safe edge for ts. Suppose t ■(— MCC sx. If t contains the edge e, then t can always 
he constructed hy choosing e as a first step. Hence t ■(— MCC {gstep sx) and the 
greedy condition is satisfied. Otherwise, t does not contain e, and adding etot would 
create a (unique) cycle. Remove any edge e' in the cycle and replace it with e. The 
result is another spanning tree t' with cost t' ^ cost t, because weight e ^ weight e'. 
Furthermore, since t' contains e and t' can he constructed hy choosing e as a first 
step, we can take t = t' to satisfy the greedy condition. 

Hence one way to formulate Kruskal’s algorithm is as follows: 

kruskal :: Graph —)■ Tree 

kruskal = extract ■ until done gstep ■ start 

start g = ([([v],[]) I V nodes g],sortOn weight {edges g)) 

The function sortOn in the Haskell library Data.List appeared in Exercise 5.12. 
There is another way to formulate the algorithm, which is to write 

kruskal :: Graph —)■ Tree 

kruskalg = extract {apply {n — l) gstep {start g )) 
where n = length {nodes g) 

Given a connected graph with n vertices, we know that gstep will be applied exactly 
n—l times. 

It remains to time the program. Suppose the graph has n vertices and e edges. As 
the graph is assumed to be connected and there is at most one edge between any 
two vertices, we have n — l ^ e ^n{n — l)/2. Sorting the edges takes 0{e log e) 
steps. As we have seen, ouch find and add operation takes 0{n) steps. In the worst 
case, all e edges may have to be considered, so there are 2e calls of find and n — l 
calls of add, for a total running time of 0{e log e + en + n^) = 0{en) steps. 

The bottleneck in this algorithm is the time complexity of find and add. A faster 
implementation of these functions makes use of a special data structure for comput¬ 
ing with disjoint sets. We turn to this topic next. 
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9.3 Disjoint sets and the union-find algorithm 

The computationally expensive part of KruskaTs algorithm lies in the maintenance 
of a collection of disjoint sets, the vertices of the trees in the forest. Initially, each 
vertex is in a set by itself. Each union operation reduces the number of disjoint 
sets by one. The function yint/ has to discover which set in the collection contains a 
given vertex. Rather than returning the whole set, we can define yinr/ to return the 
name of the set. The name of a set is some designated vertex in the set. Let DS be 
some data type for maintaining disjoint sets of vertices v in the range 1 ^ v ^ n for 
some n. What we need are the following three operations on DS, in which Name is 
a synonym for Vertex: 

startDS :: Nat —)■ DS 

findDS :: DS —)■ Vertex —)■ Name 

unionDS:: Name —)■ Name —)■ DS —)■ DS 

The function startDS takes a positive integer n and returns a collection of n singleton 
sets, each containing a unique vertex v in the range 1 ^ v ^ n. The function yinr/DS' 
takes a vertex v and returns the name of the set in the collection that contains v. The 
function unionDS takes two different names and replaces the two named sets in the 
collection by a single set with an appropriately chosen name, in fact the name of the 
larger set. 

Here is the implementation of KruskaTs algorithm that uses these three functions. 
The disjoint sets of vertices are separated out from the trees in the forest, and all the 
tree edges are combined into one set. Thus we change the state to read 

type State = {DS, [Edge], [Edge]) 

Then we can define 

kruskal:: Graph —)■ Tree 

kruskalg = extract {apply {n—l) gstep s) 

where extract (_,C5,_) = {nodes g,es) 
n = length {nodes g) 

s = {startDS n, \ ],sortOn weight {edges g)) 

The revised definition of gstep is 
gstep:: State —)■ State 

gstep {ds,fs, e : es) = if ni ^ n 2 then {unionDS ui n 2 ds, e :fs, es) 

else gstep {ds,fs,es) 
where n\ = findDS ds {source e) 
n 2 = findDS ds {target e) 

In the simple implementation of KruskaTs algorithm described above, the three 
operations startDS, findDS, and unionDS can each be implemented to take 0{n) 
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steps. But we can do better with two other implementations, which we will call 
implementations A and B. In implementation A, the functionyinr/DS' takes 0(log n) 
steps in the worst case, while unionDS takes 0{n) steps. Recalling that Kruskal’s 
algorithm may require 2 e calls of findDS and n — 1 calls of unionDS, that means 
a total running time of 0{e log n + n^) steps, a significant improvement on 0{en) 
steps. In implementation B, the functionyinr/DS' takes 0(log^ n) steps, but unionDS 
takes only f?(log n) steps. That means a total running time of 0{e log^ n + n log n) 
steps, again an improvement on 0{en) steps. 

These timing bounds are not the best that can be achieved: one can construct 
implementations of findDS and unionDS so that a sequence of 0{e) find operations 
and up to n — 1 union operations takes 0{e log n) steps. However, this implemen¬ 
tation seems impossible to achieve in a purely functional setting because it relies 
on mutable arrays with a constant-time update function. Although mutable data 
structures can be handled with monadic programming, we choose not to do so. The 
so-called Union-Find problem is a well-known example of a problem in which the 
complexity of the best purely functional solution seems to be inferior to that of the 
best imperative one. 

Implementation A of DS also uses an array, but the array is an immutable one. 
Recall the following three functions from Section 3.3: 

listArray ::Ixi^ (/,/) —)■ [e] —)■ Array i e 

(!) ::Ixi^ Array ie ^ i^ e 

(//) ::Ixi^ Array / e —)■ [ (/,e) ] —)■ Array i e 

The first function constructs an array from a pair of bounds and a list of values 
in index order, the second is the array lookup function, and the third is an update 
function. Building an array takes linear time, a lookup takes constant time, but an 
update takes linear time even for an update at a single position. We will use the 
following tailored versions of the three operations above: 

fromList:: [a\ —)■ Array Vertex a 
froniList xs = listArray (1, length xs) xs 

index "Array Vertex a —)■ Vertex —)■ a 
index av = a\v 

update :: Vertex —)■ a —)■ Array Vertex a —)■ Array Vertex a 
update vxa = a jj [(v,x)] 

Here is the definition of DS based on arrays: 
type Size = Flat 

data DS = DS { names :: Array Vertex Vertex, sizes :: Array Vertex Size } 

The implementation consists of just two arrays, one for naming the sets in the 
collection, and one for computing their sizes. The sets themselves can be determined 
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from the fact that two elements have the same name if and only if they are in the 
same set. 

The definition of startDS is 
startDS v.Nat —DS 

startDS n = DS (fromList [ 1.. n]) (fromList {replicate n\)) 

Recall that we assume vertices are lahelled from 1 to n for some n. Initially every 
set is a singleton set of size 1. The name of a set is the value of its sole occupant. In 
the general case, the name of a set is a value k such that 

index {names ds) k = k 

Each entry in the names array is either a name or a vertex whose entry is either a 
name or another vertex with the same property. Thus we can find the name of the 
set containing a specified vertex by tracing back in the names array until an entry is 
found that points to itself. That gives us the definition of findDS: 

findDS : :DS —)■ Vertex —)• Name 
findDS ds x = if x==y then x eist findDS ds y 
where y = index {names ds) x 

The time complexity of this operation depends on how far away a vertex is from 
the name of the set containing it. We will show how this distance is kept small in a 
moment. Finally, unionDS is defined by 

unionDS :: Name —)■ Name —)■ DS —)■ DS 
unionDS n\ n 2 ds = DS ns ss 
where {ns,ss) = 
if < S2 

then {update n\ n 2 {names ds), update n 2 (^i +^ 2 ) {sizes ds)) 
else {updaten 2 n\ {names ds),update n\ ( 51 + 52 ) {sizes ds)) 

51 = index {sizes ds) ni 

52 = index {sizes ds) n 2 

The first two arguments of unionDS are different names, not arbitrary vertices. The 
sizes of the sets corresponding to the two names are computed, and the smaller 
set is absorbed into the larger by renaming the smaller set with the name of the 
larger. Finally, the size of the larger set is increased accordingly. The sole but critical 
purpose of maintaining size information is to ensure that the number of findDS 
operations used in looking up the name of a set is as small as possible. If the first 
lookup does not yield the name of a set, it is because the set has been absorbed into 
a larger one. A set of size 1 is absorbed into a set of size at least 1, which in turn is 
absorbed into a set of size at least 2, which in turn is absorbed into a set of size 4, 
and so on. It follows that, if there are k lookups in a search for the name of a set S, 
then S has size at least 2^^^. That gives us the bound k ^ [log n\+\. 
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Figure 9.1 An example of Union-Find with seven vertices. After each operation, 
the two rows show the resulting names and sizes. 


An example of the use of these operations is given in Figure 9.1. Observe that the 
second rows show the size correctly only for the name of the set; for example, after 
unionDS 1 2 the set with name 1 has the correct size 2, but the size associated with 
2 (which is no longer a name) remains 1. At the end of the four union operations we 
have 

map (findDS ds) [1 • .7] = [6,6,6,4,5,6,6] 

Thus the set of disjoint sets is reduced to three sets: { 1,2,3,6,7} with name 6 and 
two singleton sets {4} and {5} with names 4 and 5, respectively. In particular, to 
find the name of the set containing 2 we have to evaluateyindDS' three times: 

findDS ds 2 = findDS ds 1 = findDS ds6 = 6 

But the set containing 2 is a set of size 5, and [log 5J + 1 = 3, which is just what 
the bound above predicts. 

The second implementation, implementation B, of DS uses the data structure of 
random-access lists from Chapter 3. This time we have 

data DS = DS { names :: RAList Vertex, sizes:: RAList Size } 

The definition of startDS is 

startDS:: Nat —)■ DS 

startDS n = DS {toRA [1..«]) (toRA {replicate nl)) 

toRAv. [a] —7- RAList a 
toRA =foldr consRA nilRA 

The definitions of findDS and unionDS remain the same, except for the changes 
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indexxsx = lookupRA {x—l)xs 
update ni n 2 xs = updateRA {n\ — l) n 2 xs 
because positions in random-access lists are indexed from 0 rather than 1. 

Now we can restate the various running times. The implementation of unionDS 
involves two lookups and two updates, for a total of 0{n) steps for implementa¬ 
tion A, and 0(log n) steps for implementation B. Implementation A oifindDS takes 
0(log n) steps, whereas implementation B takes 0(log^ n) steps. With implemen¬ 
tation A the total running time of KruskaTs algorithm is 0{n^) steps on a sparse 
graph and 0{n^ log n) steps on a dense graph. With implementation B the times 
are 0{n log^ n) steps for a sparse graph and 0{n^ log^ n) steps for a dense graph. It 
follows that implementation B is better for sparse graphs, while implementation A 
is better for dense graphs. 


9.4 Prim’s algorithm 

The only difference between Prim’s algorithm and Kruskal’s algorithm is that a tree 
is constructed at each step rather than a forest. Here is the revised definition of states 
and spats: 

type State = (Tree, [Edge]) 
spats:: Graph —)■ [Tree] 

spats g = map fSt {until {all done) {concatMap steps) [start g]) 

where done {t,es) = {length {nodes t) == length {nodes g)) 
start g = {{[head {nodes g)],[]),edges g) 

This time the starting state is defined by arbitrarily selecting the first vertex of g as 
the initial tree. The function steps is virtually the same as in KruskaTs algorithm, 
namely 

steps:: State —)■ [State] 

steps {t,es) = [{add e t,es') [ {e,es') ■(— picks es,safeEdge e t] 
except for different definitions of add and safeEdge. This time, safeEdge determines 
whether an edge has exactly one endpoint in the tree: 

safeEdge:: Edge —)■ Tree —)■ Bool 

safeEdge e t = elem {source e) {nodes t) / elem {target e) {nodes t) 

The function add adds an edge to a tree: 
add ::Edge —)■ Tree —)■ Tree 

add e (v5, es) = if elem {source e) V5 then {target e:vs,e: es) 

else {source e:vs,e: es) 

The greedy algorithm is derived in the same way as KruskaTs algorithm. First of all, 
define 
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MCC = MinWith cost ■ mapfst ■ until (all done) {concatMap steps) ■ wrap 
We then have 

extract {until done gstep sx) ■(— MCC sx 
provided we ean show that there exists a tree t for whieh 
1 1 — MCC {gstep sx) A ft— MCC sx 

As before, we can establish the greedy condition by defining gstep to select a safe 
edge of minimum weight. Assuming the list of edges is in ascending order of weight, 
that means 

gstep (f, e'.es) = if safeEdge e t then {add e t, es) else keep e {gstep (f, es)) 
where keep e {t, es) = {t,e: es) 

The function keep is needed because, unlike KruskaTs algorithm, an edge that 
cannot be added to a tree at one step could still be added at a later step when the 
tree has grown some more. 

The proof of the greedy condition is also very similar to that for Kruskal, but 
it is worth spelling out the details. Consider an incomplete state sx = {ti,es), so 
more edges can be added to fi, and let e be an element of es of lightest weight that 
is a safe edge for fi. Without loss of generality, suppose source e is a vertex of fi 
and target e is not. Now let f 2 ■(— MCC sx. If f 2 contains the edge e, then t 2 can be 
constructed by choosing e as a first step. Hence t 2 MCC {gstep sx) and we can 
choose f = f 2 to satisfy the greedy condition. Otherwise, t 2 does not contain e and 
adding e to t 2 would create a (unique) cycle. This time we have to be more careful 
in selecting an edge of t 2 that can be replaced by e. Observe that among the edges of 
the cycle there has to be an edge e' such that source e' is a vertex in ti and target e' 
is not. If this were not the case, then e would not be a safe edge for fi. Replacing e' 
by e in t 2 gives another tree t^ whose cost is no greater than cost t 2 . And now we 
can take f = f 3 to satisfy the greedy condition. 

The greedy algorithm can be expressed in almost the same way as the first version 
of KruskaTs algorithm: 

prim :: Graph —)■ Tree 

prim g =fst {until done gstep {start g)) 

where done {t,es) = {length {nodes t) == length {nodes g)) 

As an alternative we can write 

prim g =fst {apply {n — l) gstep {start g)) 
where n = length {nodes g) 

with a somewhat more efficient definition of the termination condition. However, 
the main problem with this version of Prim’s algorithm is that it is not very efficient. 
At step k, when the tree has k vertices and k—\ edges, the number of edges that 
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may have to be ehecked before finding a safe edge is 0{e — k). That means gstep 
takes 0{k {e — k)) steps, because safeEdge takes 0{k) steps. Summing over all steps 
gives a running time of 

n—1 n—1 

^ 0{k {e — k)) = ^ 0{ke) = 0{en^) 
k=\ k=i 

steps for Prim’s algorithm, compared with 0{en) steps for the first version of 
Kruskal’s algorithm. The bound can be improved by using an efficient implemen¬ 
tation of sets with a membership test that takes logarithmic time. That reduces 
the times for safeEdge and add to 0(log k) steps, and the total time to 0{en log n) 
steps. But the result is still worse than KruskaTs algorithm. 

In fact, we can reduce the running time of Prim’s algorithm to 0{rf) steps by 
reducing the number of edges that have to be considered at each step. The idea is to 
maintain for each vertex v off the tree at most one edge, that edge being one of least 
weight that connects v to some tree vertex. When the tree is updated with a new 
vertex, the candidate edges for the next step can be updated as well. The number 
of candidate edges is therefore 0{n) at each stage. The result will be a version of 
Prim’s algorithm that takes 0{rf) steps both for sparse and for dense graphs, which 
is better than the 0{e log n + n^) bound for KruskaTs algorithm. 

To implement the idea we need two arrays. First we suppose that vertices are 
named by integers in the range 1 to n for some n, so they can be used as array 
indices. States are redefined to be 

type State = {Links, [Vertex]) 

type Links = Array Vertex {Vertex, Weight) 

The first component of a state is now an array rather than a tree. The entry for a 
vertex v not in the tree is a pair {u,w) for which the edge {u,v,w) is a lightest edge 
linking v to any vertex u on the tree. The vertex u is called the parent of v. We 
therefore define 

parent:: Links —)■ Vertex —)• Vertex 
parent Is v =fst {Is ! v) 

weight "Links —)■ Vertex —)■ Weight 
weight Isv = snd {Is ! v) 

If there is no edge connecting v to a tree vertex, then the parent of v is v itself and 
the associated weight is infinitely large. Apart from the root, the parent of a vertex v 
in the tree is the vertex of the tree to which v was linked when it was added to the 
tree. 

The second component of a state is a list of vertices, not edges. These are the 
fresh vertices, vertices not yet on the tree. For example, in the following state the 
tree vertices are [1,2,3,4,5] while [6,7,8] are fresh: 
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The dashed line connecting 4 and 6 indicates that the lightest edge connecting the 
vertex 6 to the tree is the edge (4,6,2); similarly, the lightest edge connecting 7 to 
the tree is (5,7,5). Vertex 8 has no edges connecting it to the tree. In this state the 
first component is the array 
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The second array is a fixed one and is needed simply to he able to determine the 
weight of an edge in constant time rather than having to search through the edges 
each time: 

type Weights = Array {Vertex, Vertex) Weight 

We will leave the definition of weights:: Graph —)■ Weights as Exercise 9.9. The final 
version of Prim’s algorithm can now he expressed as follows: 

prim:: Graph —)■ Tree 

prim g = extract {apply {n — l) {gstep wa) {start n)) 
where n = length {nodes g) 
wa = weights g 

In the initial state, all vertices are fresh and all except vertex 1 have infinite weights: 
start:: Nat —)■ State 

start n = {array (l,n) ((1, (1,0)): [(v, {v,maxlnt)) \ v [2..«]]), [1..«]) 
maxint:: Int 
maxint = maxBound 

The value maxint, the largest possible element of Int, represents an infinite weight. 
Vertex 1 has zero weight and the default parent vertex for each entry is the vertex 
itself. The function gstep is defined by 

gstep:: Weights —)■ State —)■ State 
gstep wa {Is, vs) = {Is', vs') 

where (-, v) = minimum [{weight Is v,v) \ v v^] 

= filter {^v) vs 

Is' = accum better Is [{u, {v,wa ! {u,v))) \ u ■(— 

better (vi,^!) (v2,h' 2) = if wi ^ W 2 then (vi,^!) else (v2,h'2) 
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The function gstep selects a fresh vertex v closest to the tree and updates the links by 
replacing each parent of a fresh vertex u with v and the weight of the edge (m, v) if 
the replacement yields a lighter link to the tree. Finally, the function extract extracts 
the final tree from the final state: 

extract :: State —)■ Tree 

extract {Is,-) = {indices Is, [{u,v,w) \ (v, {u,w)) ■(— assocs ls,v ^l]) 

Each gstep operation takes 0{n) steps, so the revised definition of prim therefore 
takes 0{n^) steps. As we will see in the following section, essentially the same 
algorithm can be used for computing shortest paths on a directed graph. 


9.5 Single-source shortest paths 

We turn now to directed graphs and shortest paths. The notion of getting from one 
point to another involves a direction of travel, so graphs with directed edges are an 
appropriate basis for studying shortest routes. There is no loss in moving to digraphs 
because a graph can always be modelled as a digraph by representing each edge 
as two directed edges, each with the same weight. Typical of the problems we can 
solve using a shortest-paths algorithm is: given a network of streets in a city that 
may include one-way streets, what is the shortest route by car from one address to 
another? 

Normally, the cost of a route is the sum of the lengths (that is, the weights) of 
the edges along the route, but there are examples where other aggregation functions 
are required. For instance, if you are a hiker and the routes are footpaths, the best 
route may be one with the shallowest uphill climb. Each footpath is associated 
with a measure of its gradient and the cost of a route is the maximum of the 
individual gradients along the path. In such a case, the best route for an unfit walker 
is one that minimises this cost. As a dual example, some roads may have height 
restrictions owing to bridges over the roads. Here the best route for the driver of a 
high-sided vehicle is one that maximises the minimum of the heights of the bridges 
along a route. In what follows we will focus on distances and their sums, but the 
algorithm we will describe, a version of Dijkstra’s algorithm, is easily adapted to 
other situations. 

Finding a shortest path P from A to B necessarily involves finding a shortest path 
from A to every node along E: shortest paths have shortest sub-paths. In the worst 
case, the route to B may be discovered only after finding the routes from A to all 
other nodes in the network. In other words, the algorithm may have to compute a 
shortest-paths spanning tree (SPST) rooted at A. Note that a SPST is a different 
animal from a MCST. In this section we will concentrate on finding a SPST for any 
digraph for which there is a path from the given source vertex to every other vertex. 
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The algorithm can he modified to terminate as soon as the shortest path to a given 
destination is discovered. 

Until now we have assumed nothing about edge weights except that, for simplicity, 
they were integers. But from now on we need the assumption that no weight is 
negative. With negative weights there is the possibility of having cycles with negative 
costs, and that allows paths with infinite negative costs. Some algorithms can cope 
with negative weights (as long as there are no cycles with negative costs), but not 
the algorithm we describe. We will see why we need this assumption later on. 

Another feature of the problem is that, unlike the case of a MCST, the optimality 
of a SPST rooted at A cannot be expressed in terms of a single numerical value. 
The cost of a tree depends on the path costs from A to all other vertices on the tree. 
The obvious way to state that one tree is no worse than another is to require that 
the distances to every vertex in the first tree are no greater than the corresponding 
distances in the second. This requirement defines a preorder on trees but not a total 
preorder. 

A final point to bear in mind is that the algorithm we will discuss is not the one 
found in a car navigation system for computing real-life shortest routes. Actual road 
networks are based on real distances, and adjacent towns in the network are more or 
less closer than towns separated by long routes. That means certain heuristics can 
be employed for finding shortest routes quickly. The resulting algorithm, called the 
A* search algorithm, will be discussed in Chapter 16. 


9.6 Dijkstra's algorithm 

Our shortest-paths spanning tree algorithm, a version of Dijkstra’s algorithm, uses 
essentially the same definition of states as in the final version of Prim’s algorithm, 
except that edge weights in the links array are replaced by distances, where the 
distance from the source vertex 1 to vertex v is the sum of the weights of the edges 
along the path from 1 to v: 

type State = {Links^ [Vertex]) 

type Links = Array Vertex {Vertex,Distance) 

type Distance = Int 

parent :: Links —)■ Vertex —)■ Vertex 
parent Is v =fst {Is ! v) 

distance "Links —)■ Vertex —)■ Distance 
distance Isv = snd {Is ! v) 


For example, the state 
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Figure 9.2 An example digraph 



in which the fresh vertices are [6,7,8] is represented by the array 



1 

2 

3 

4 

5 

6 

7 

8 

parent 

1 

1 

1 

3 

3 

4 

5 

8 

distance 

0 

7 

15 

26 

19 

28 

24 

OO 


In particular, the fresh vertex closest to the tree is vertex 7, with a distance from 
vertex 1 of 15 + 4 + 5 = 24. 

Except for one or two small changes, Dijkstra’s algorithm is identical to Prim’s 
algorithm: 

dijkstra :: Graph —)■ Tree 

dijkstra g = extract {apply {n — l) {gstep wa) {start n)) 
where n = length {nodes g) 
wa = weights g 

The function weights has to be defined differently from how it was in Prim’s 
algorithm because we are now dealing with a directed graph (see Exercise 9.9). The 
functions start and extract are exactly the same as in Prim’s algorithm, and gstep is 
defined by 

gstep :: Weights —)■ State —)■ State 
gstep wa {is, vs) = {is', vs') 

where {d,v) = minimum [{distance Is v,v) \ v v^'j 
= filter {^v) vs 

is' = accum better is [{u, {v,sum d {wa ! {v,u)))) \ u t— 

where sum dw = iiw == maxint then maxint else d + w 
better {v\,di) {v 2 ,d 2 ) = if ^ ^2 then {vi,di) else {v 2 ,d 2 ) 
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vertex 

1 

2 

3 

4 

5 

6 

1 

1 

2 

3 

4 

5 

6 


0 

oo 

OO 

OO 

OO 

OO 

2 

1 

1 

1 

4 

5 

6 


0 

7 

10 

OO 

OO 

OO 

3 

1 

1 

2 

2 

5 

6 


0 

7 

9 

16 

oo 

oo 

6 

1 

1 

2 

2 

3 

3 


0 

7 

9 

16 

13 

10 

5 

1 

1 

2 

2 

6 

3 


0 

7 

9 

16 

11 

10 

4 

1 

1 

2 

5 

6 

3 


0 

7 

9 

12 

11 

10 


Figure 9.3 A sequence of five greedy steps 


Each application of gstep selects a fresh vertex v of minimum distance from the 
source vertex 1. There are n — 1 fresh vertices, so gstep is applied n — l times. After 
selecting v, the function gstep updates the parents and distances for each fresh vertex 
u whenever there is a path to u going through v that is shorter. For instance, in the 
example above, adding 7 as a new tree node, with distance 24, we may find an edge 
(7,6,1), so the distance from 1 to the fresh vertex 6 can he reduced to 24 + 1, which 
is better than the current best distance 28. Note the necessity for the function sum 
in the definition of gstep. The reason is that if (v, u) is not an edge, so its weight is 
maxint, then the new distance of u from the source vertex should also be maxint. 
That requires d + maxint = maxint for any finite distance d, an equation that does 
not hold in Haskell. 

The function extract extracts the spanning tree as a graph, but a better result is to 
return the actual paths from the source node to each other vertex: 

type Path = ([ Vertex ], Distance) 
extract :: State —5- [Path] 

extract {Is, _) = [{reverse {getPath Is v),distance v) | v t— indices Is] 
getPath Isv = if u == v then [u] else v: getPath Is u 
where u = parent Is v 

Let us walk through an example to show how Dijkstra’s algorithm works out in 
practice. Consider the digraph of Figure 9.2 in which n = 6. There is a path from the 
source vertex 1 to every other vertex, so it is possible to construct a SPST rooted at 
vertex 1. Figure 9.3 shows the sequence of n — 1 greedy steps. The vertex on the left 
is the vertex found at the beginning of each step. The final disfances in Figure 9.3 are 
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Figure 9.4 The shortest-paths spanning tree from vertex 1 


the costs of the shortest paths from the source vertex 1 to all vertices. In particular, 
the shortest route to vertex 4 has cost 12 and is along the path [1,2,3,6,5,4]. The 
spanning tree is shown in Figure 9.4. 

It remains to prove that Dijkstra’s algorithm works correctly. We did not give a 
definition of the list of all shortest-paths spanning trees, so a proof based on the 
generic greedy condition of the previous chapter is not available to us. Instead, we 
give a direct proof. We show that, at each step, the distance recorded in the state for 
every vertex on the tree is indeed the shortest distance from the source. In symbols, 
if 

{Is, vs) = apply k {gstep wa) {start n) 
then for all tree vertices v (those not in vi), we have 

distance Isv = shortest g v 

where shortest g v is the cost of the shortest path in the graph g from the source 
vertex to v. The proof of the claim is by induction on k. The base case k = 0 is 
immediate since the only tree vertex is the source vertex 1 and the shortest path is 
the empty path with distance 0. For the induction step, let v be the vertex selected 
by gstep and let F be a path of shortest distance in the graph from the source vertex 
to V. Such a path has to contain a fresh vertex because v itself is fresh. Suppose that 
{x,y,w) is the first edge in P for which y is fresh. Since x is a tree vertex, and the 
distances to tree vertices are never changed once they are set, we have by induction 
that 

distance is x = shortest g x 

After selecting x, the function gstep updates the distances to each fresh vertex, 
including y, and, since such distances are never increased, we have 

distance Isy ^ distance is x + w = shortest g x + w = shortest g y 

Flence, since computed distances are never shorter than the shortest possible distance, 
we have distance Isy = shortest g y. 

We can now reason 
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distance Is v 

^ { definition of v as a closest fresh vertex as y is fresh } 

distance Is y 
= { above } 

shortest g y 

^ { since P passes through y } 

shortest g v 

So, by the same argument as before, distance Isv = shortest g v. Note that it is in 
the very last step that we exploit the fact that edge weights are not negative, so the 
initial section of the path P to y cannot cost more than P itself. 

We have introduced Dijkstra’s algorithm as a variant of Prim’s algorithm, but 
there is another way of formulating Dijkstra’s algorithm, namely as a version of 
breadth-first search, a topic we will take up in Part Six (see Chapter 16 for details). 


9.7 The jogger’s problem 

Finally, here is one application of Dijkstra’s algorithm. Consider the plight of a 
reluctant jogger who, while willing to undertake exercise, wishes to suffer as little 
unpleasantness as possible. The jogger is confronted with a network of footpaths, 
each of which possesses some nonnegative measure of undesirability, say its length. 
Beginning at some specified point, called ‘home’, the jogger wishes to plan a circular 
route, no footpath being traversed more than once, of minimum total undesirability. 
We will suppose that the undesirability of a footpath is independent of the direction 
of travel, so we are dealing with an undirected network of footpaths. Such a route 
will be a cycle in the network, that is, a circular path consisting of distinct vertices 
(footpath junctions) as well as distinct edges (why?). 

Abstractly put, the problem is to determine, given a graph G = {V,E) and a 
specified home vertex a, a cycle that begins and ends at a and is of minimum total 
cost, where the cost of each individual footpath is some given positive value. Since 
no footpath can be travelled more than once, there must be at least three different 
edges in the cycle. In what follows we assume G is a connected graph and that such 
a cycle exists. For example, the graph 
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has five possible cycles from the source node 1, each of which can be travelled in 
either direction: 1231,12341,1241,12431,1341. 

A simple method for computing a minimum jog 7 in a graph G is to observe that 
the path P defined by J from a to the last vertex, say v, before returning to a via the 
edge {x,a) has to be a shortest path from a to v in a modified graph G{x) in which 
the edge between a and x is removed. If there were a shorter path, then there would 
be a shorter cycle. That means Dijkstra’s algorithm can be used on G{x) to find P, 
provided that each undirected edge in G is replaced by two directed edges with the 
same weight. A best possible jog can then be found by running Dijkstra’s algorithm 
on all graphs G{x) for which x is incident on a. Since Dijkstra’s algorithm can take 
&{n^) steps, this method can take &{n^) steps if there are &{n) edges incident on a. 
The algorithm works equally well for both graphs and digraphs. Nevertheless, there 
seems to be a lot of duplicated effort in the method, so it is sensible to ask whether 
there is a way of using Dijkstra’s algorithm just once to solve the jogger’s problem. 
The answer is yes, as we will now see. 

Let r be a shortest-path spanning tree of a graph G rooted at vertex a. We are 
going to show that there is some minimum jog J of G with the property that all the 
constituent edges of J are in T except one. There has to be at least one such edge 
since T is acyclic. This property is called the single-edge property. 

Let 7 be a minimum jog with the fewest number of non-T edges. Suppose x is 
the first vertex in 7 such that the edge from x is not in T, and let y be the last vertex 
such that the edge to y is not in T. The case x = a is not excluded, nor is y = a, but 
X and y have to be different vertices. Since the graph is undirected, the roles of x 
and y are dual. Here is a picture in which solid lines are paths in T : 



Our aim is to show that the dashed line is a single non-T edge. Using the notation 
(m • • • v)g to mean a path from u to v with vertices and edges in G, we have 

7 = {a---x)T {x---y)j {y---a)T 

Since 7 is a cycle in which no vertex, apart from a, is repeated, x and y have no 
common ancestor in T apart from a; in symbols, 

(a • • •xjT-n (y • • ■a)T = {a) 
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We now show that the assumption that there is some intermediate vertex z on the 
path {x---y)j leads to a eontradietion. Suppose sueh a z exists. Here is the pieture: 



Consider the jog J', defined hy 

7' = (a-• -xjr (x-• ■z)j (z---a)T 

The jog J' has fewer non-T edges than J. Furthermore, since T is a shortest-paths 
spanning tree of an undirected graph, we have 

cost I {z---a)T 1^ cost {z---y)j {y---a)T 
Hence 

cost J' = cost (a ■ ■ ■x)T + cost (x- • -zjj + cost {z-- ■a)T 

^ cost {a ■ ■ ■x)T + cost (x- • ■z)j + cost (z- • ■y)T + cost {y ■ ■ ■a)T 
^ cost J 

This contradicts the assumption that 7 is a shortest jog with the fewest non-T edges. 
So no such vertex z exists. 

Now we are ready to descrihe the algorithm. Suppose, as usual, that the home 
vertex is vertex 1 and let T he a shortest-paths spanning tree with source vertex 1. It 
follows from the single-edge property that we have to find an edge e = (x,y), with 
x<y, such that: (i) e is not an edge of T ; (ii) e creates a cycle containing vertex 1 
when added to T ; and (iii) e minimises the sum of the distance in T from vertex 1 
to vertex x, the weight of e, and the distance in T from vertex y to vertex 1 (which, 
since the graph is undirected, is the same as the distance from 1 to y). In fact it is 
not necessary to insist that (x,y) is a real edge, because if it is not then its weight 
is infinite and cannot minimise the sum. Call any pair (x,y) satisfying the first two 
properties a candidate pair. 

We can identify a candidate pair by considering two cases. In the first case x = I, 
so y cannot be connected directly to vertex 1 in T; that is, the parent of y in T cannot 
be 1. In the second case, neither x nor y is vertex I. In this case define the subtrees 
of T to be those trees that result from deleting all edges of T incident on vertex 1. 
In this case, (x,y) is a candidate pair if x and y belong to different subtrees. Call 
the root of the subtree to which x belongs the root of x. Then a pair of vertices is a 
candidate pair in the second case if they have different roots. 
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For example, in the spanning tree 



the roots of the two subtrees are 2 and 3, and the candidate pairs are (1,4), (1,5), 
(1,6), (2,3), (2,4), (2,5), and (2,6). However, none of the pairs (3,6), (4,5), (5,6) 
is a candidate, because these pairs have the same root, namely 3. The root of a vertex 
can be computed from the links array constructed by Dijkstra’s algorithm: 

root:: Links —)■ Vertex —)■ Vertex 
root Isv = \^p == I then v else root Is p 
where p = parent Is v 

A better method, left as an exercise, is to install a third component in the links 
array, one that computes the root associated with each vertex, and to update this 
component when processing vertices. In this way we can ensure that evaluation of 
root takes constant time. Now we can define 

candidate "Links —)■ (Vertex, Vertex) —)■ Bool 

candidate Is (x,y) = ifx == 1 then parent Isy ^ I else root Isx^ root Isy 
The jogger’s problem can now be solved by defining 

jog w Graph —)■ [Edge] 

jog g = getPath Is wa (bestEdge Is wa) 

where Is =fst (apply (n — l) (gstep wa) (start n)) 
wa = weights g 
n = length (nodes g) 

The funcfions gstep and start are fhe same as in Dijkstra’s algorithm, while weights 
is the same as in Prim’s algorithm because the graph is undirected. The function 
bestEdge is defined by 

bestEdge "Links —)■ Weights —)■ (Vertex, Vertex) 
bestEdge Is wa = 

minWith cost [(x,y) | x ■(— [1. ■n],y •(— [x+ 1 . .n],candidate Is (x,y)] 
where n = snd (bounds Is) 

cost (x,y) = if w == maxint then maxint 

else distance Is x + w + distance Is y 
where w = wa \ (x,y) 

If (x,y) is not an edge, so its weight is maxint, then cost (x,y) should also be maxint. 
The function getPath is defined by 
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getPath -/.Links —)■ Weights —)■ {Vertex., Vertex) —)■ [Edge] 
getPath Is wa {x,y) = 

reverse (pathx) +1- [{x,y,wa ! (x,y))] -H- [(v,m,w) | {u,v,w) •(— pathy] 
where pathx = ifx== 1 then [] else {p,x,wa ! {p,x)) -.pathp 
where p = parent Is x 

Building the array and determining the best candidate edge takes 0{n^) steps, so 
the jogger’s problem can be solved in this time. 


9.8 Chapter notes 

For an interesting history of the minimum-cost spanning tree problem, see [4]. The 
four proofs of Cayley’s formula (see Answer 9.3) can be found in [1]. The Steiner 
tree problem mentioned in the introduction is studied in [7]. 

A fast Union-Find algorithm is presented and analysed in [8]. Kruskal’s algorithm 
was described in [5] and Prim’s algorithm in [6]. Prim’s algorithm should perhaps 
be called Jamik’s algorithm because it was invented earlier by Vojtech Jarnik in 
1930 and rediscovered by Prim in 1957, and again by Dijkstra in 1959. Alternative 
descriptions of these algorithms can be found in most textbooks on algorithm design. 
Dijkstra’s shortest-paths algorithm was described in a short article in [3]. The 
jogger’s problem is taken from [2], where a second version involving digraphs is 
also discussed. 
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Exercises 

Exercise 9.1 Some quick questions on graphs and digraphs: 

1. Why can a digraph of n vertices contain up to edges, while a graph can contain 
no more than n{n—\)/2 edges? 

2. Can a digraph have a cycle of length two? 

3. Why is it the case that in an acyclic graph there is at most one path between any 
two vertices? Is this true of acyclic digraphs? 

4. Can a labelled graph have more than one edge between two vertices? 

5. Why is a spanning forest of a connected graph necessarily a spanning tree? 

6 . Why does a spanning tree of a connected graph of n vertices have exactly n — l 
edges? 

7. What is the maximum number of edges in a longest possible cycle of a graph of 
n vertices? 

Exercise 9.2 Assuming vertices are labelled from 1 to n, define functions 

toAdj :: Graph —?■ AdjArray 
toGraph :: AdjArray —)■ Graph 

for converting a digraph into its adjacency representation and vice versa. 

Exercise 9.3 Draw all the spanning trees for the following graph: 



Exercise 9.4 Assign weights to the edges AB and CD in the following graph to 
show that the path from A to D in a MCST is not necessarily the shortest path from 
A to D. 



Exercise 9.5 Here is a possible divide-and-conquer algorithm for computing a 
MCST. Divide the vertices V of the graph into two sets Vi and V 2 that differ in size 
by at most one. Let Ei be the set of edges whose endpoints are in Vi. Recursively 
find a MCST for G\ = {V\,E\) and G 2 = (172,£" 2 ). Finally, select a lightest edge, 
one of whose endpoints is in Vi and the other in V 2 , and add it to the two MCSTs to 
form a single MCST. Does this algorithm work? 
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Exercise 9.6 The function steps in the specification of KruskaTs algorithm does 
not discard an edge if it creates a cycle in a given forest, even though it will also 
create a cycle in any subsequent forest. Write down a version of steps that does 
discard such edges. 

Exercise 9.7 What is the output of KruskaTs algorithm if the input is not a con¬ 
nected graph? Adapt the algorithm to find the minimum-cost spanning forest of an 
unconnected graph. 

Exercise 9.8 Why is the test t\ / t 2 in the specification of KruskaTs algorithm 
sufficient to determine whether two trees in a forest are different? After all, the trees 
t\ = ([1,2], [(1,2,3)] and t 2 = ([2,1], [(1,2,3)] are the same tree hut the test ti / t 2 
returns True. As a supplementary question, can the test he made more efficient? 

Exercise 9.9 Construct the function weights as used in Prim’s algorithm when the 
input is an undirected graph. What is the definition when the input is a directed 
graph? 

Exercise 9.10 Consider the problem of finding a maximum-cost spanning tree. Is 
there a greedy algorithm for this problem? 

Exercise 9.11 Here is the specification of a shortest-paths spanning tree: 
spst ■(— MinWith cost ■ spats 

The function spats returns all spanning trees of a directed graph. Give a definition of 
spats. To define cost, we need to compute the path from the source vertex to every 
other vertex in the tree. Define a function 

pathsFrom:: Vertex —)■ Tree —)■ [Path] 

where Path is a synonym for [Edge], such that pathsFrom 1 1 returns the paths from 
the source vertex 1 to every other vertex in the tree. Finally, define 

cost:: Tree —)■ [Distance] 

so that cost t = [d2,...,dn], where dy is the distance from the source vertex 1 to 
vertex v. 

Exercise 9.12 To find the shortest path between A and B, suppose we simultane¬ 
ously compute shortest routes from A to various towns, and shortest routes from 
various towns to B, stopping when some intermediate town C has been found in 
both directions. Does this idea work? 

Exercise 9.13 Give an example to show that Dijkstra’s algorithm does not work 
with negative lengths even if there are no negative-length cycles. 

Exercise 9.14 How would you modify Dijkstra’s algorithm to stop as soon as the 
shortest path to a given vertex is found? 
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Exercise 9.15 In the jogger’s problem we can install a third component in the links 
array to represent the root associated with each vertex: 

type Links = Array Vertex {Vertex, Vertex,Distance) 
type State = {Links, [Vertex]) 

parent :: Links —)■ Vertex —)• Vertex 
parent Isv = u where = ls\v 

root:: Links —)■ Vertex —)■ Vertex 
root Isv = r where (_,r,_) = ls\v 

distance:: Links —)■ Vertex —)• Distance 
distance Isv = d where (_, _,<i) = Is \v 

The starting state is then given by 

start "Nat —)■ State 
start n = 

{array {l,n) ((1, (1,1,0)): [(v, {v,v,maxlnt)) | v ■(— [2..«]]), [1. .n]) 

Give the modified definition of gstep:: Weights —)■ State —)■ State. 


Answers 

Answer 9.1 Some quick answers: 

1. Because in a digraph every vertex may contain an edge to every vertex, including 
itself, while in a graph there are no edges from a vertex to itself, and at most one 
edge between two vertices. 

2. Yes, if the digraph contains both the edges {u,v) and {v,u). 

3. Suppose there were two different paths between u and v. Let these two paths first 
meet at some vertex w after u, where w could be v. The two paths P and Q from 
u to w contain no edge in common, so the path P followed by the reverse of Q 
creates a cycle. In an acyclic digraph there can be many paths that connect two 
vertices. 

4. It is certainly possible to have both {u,v,wi) and {u,v,W 2 ) as labelled edges 
when wi / W 2 . 

5. Because if the forest consisted of two trees there would be a vertex in each tree 
connected by an edge (or the graph would not be connected), and so the edges in 
the forest would not be a maximal set. 

6. Because a tree with n nodes has exactly n — l edges, a result that is easily proved 
by induction. 

7. A maximum-length cycle will pass through every vertex once apart from the two 
endpoints, which gives a total number of n edges. 
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Answer 9.2 For a directed graph g we can define 
toAdj :: Graph —)■ AdjArray 

toAdjg = accumArray (flip {:)) [] (l,n) [(m,(v,w)) | (u,v,w) ^ edgesg] 
where n = length (nodes g) 
toGraph :: Adj Array —)■ Graph 

toGraph a = (indices a, [(u,v,w) \ (u,vws) ■(— assocs a, (v,w) •(— vwi']) 

For an undirected graph the last argument to accumArray has to be replaced by 
[(m,(v,w)) I (u,v,w) edgesg]A^[(v,(u,w)) \ (u,v,w) ^ edges g] 

Answer 9.3 There are 16 spanning trees: 



In fact, the number of possible spanning trees for n vertices is given by Cayley’s 
formula Four proofs of this remarkable result are given in [1]. 

Answer 9.4 One fully labelled graph is as follows: 



The shortest path from A to D has total length 9, while the path in the MCST has 
length 10. 

Answer 9.5 No. Take a triangle 



Let Vi = [A] and V 2 = [B,C]. The divide-and-conquer algorithm returns the edges 
AB and BC with cost 11, but the MCST has edges AB and AC with cost 3. 

Answer 9.6 Simply change the definition of steps to read 
steps:: State —)■ [State] 

steps (ts, es) = [ (add e ts, es') \ e:es' tails es, safeEdge e ti] 
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Answer 9.7 Kruskal’s algorithm will return an error because it attempts to take the 
head of an empty list. The algorithm for hnding a minimum-cost spanning forest 
(MCSF) is as follows: 

mcsf :: Graph —)■ Forest 

mcsf g =fst {until {null ■ snd) gstep s) 

where ‘5'=([([v],[]) |vt— nodes g] , sortOn weight {edges g )) 

This time the algorithm searches all the unused edges and terminates when this list 
is empty. 

Answer 9.8 Because the test t\ / t 2 is applied only to two identical trees or to two 
trees with disjoint sets of nodes. If, however, the two trees are identical, the test will 
take linear time in the size of the trees. A faster definition is 

notEqual ti t 2 = head {nodes ti) ^ head {nodes t 2 ) 

Answer 9.9 One method is to set up an array with infinite weights and then update 
the array with the actual edge weights: 

weights g = listArray ((1,1), {n,n)) {repeat maxint) 

//[{{u,v),w) I (m,v,w) ^ edges g] 

//[((v,m),w) I {u,v,w) ^ edges g] 
where n = length {nodes g) 

When the graph is directed the definition simplifies fo 

weights v. Graph —)■ Array {Vertex, Vertex) Weight 
weights g = listArray ((1,1), {n,n)) {repeat maxint) 

11 [((“>v),w) I {u,v,w) ^ edges g] 
where n = length {nodes g) 

Answer 9.10 Yes, bofh KruskaTs and Prim’s algorifhm can be adapfed by negafing 
all fhe edge weighfs. In symbols, 

maxWith cost = minWith newcost 

where fhe new cosf newcost is defined by 

newcost :: Tree —)■ Int 

newcost = sum ■ map {negate ■ weight) ■ edges 

Wifh KruskaTs algorifhm fhaf means edges are lisfed in decreasing order of weighf. 

Answer 9.11 The definifion of spats is exacfly fhe same as in Prim’s algorifhm buf 
wifh a modified definifion of add and safeEdge fo fake accounf of fhe facf fhaf edges 
are direcfed: 
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spats :: Graph —)■ [Tree] 

spatsg = mapfst [apply [n — 1) [concatMap steps) [ 5 ]) 
where n = length [nodes g) 

s = [[[head [nodes g)]^[\)^edges g) 

steps:: [Tree, [Edge]) —)■ [[Tree, [Edge])] 

steps [t, es) = [ [add e t, es') \ [e,es') ^ picks es, safeEdge e t] 

add ::Edge —)■ Tree —)■ Tree 
add e [vs, es) = [target e:vs,e: es) 

safeEdge:: Edge —)■ Tree —t Bool 

safeEdge et = elem [source e) ns A not [elem [target e) ns) 
where ns = nodes t 

The definition of pathsErom is 

pathsErom ut = 

[]:[[u,v,w):es\ [u',v,w) ■(— edges t,u' == u,es ■(— pathsErom v t] 
Finally, cost is defined by 

cost = map [sum ■ map weight) ■ sortOn [target ■ last) ■ tail ■ pathsErom 1 
Answer 9.12 Nof obviously. For example, eonsider the graph 



The elosest vertex from A is D with eost 2. The elosest vertex to B is with eost 2. 
The next closest vertex from A is C with cost 5, and the next closest vertex to B is C 
with cost 5. That gives the answer ACB with cost 10, but the path ADEB has cost 8. 

Answer 9,13 A good example is the following digraph: 



The distances from 1 to the vertices [1,2,3,4] are [0,1,5,2], but the greedy algo¬ 
rithm returns the following distances after three greedy steps (the parent array is not 
shown): 
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1 

2 

3 

4 

start 

0 

OO 

OO 

OO 

update 1 

0 

2 

5 

OO 

update 2 

0 

2 

5 

3 

update 4 

0 

2 

5 

3 


The distances to vertices 2 and 4 are incorrectly calculated as 2 and 3. 

Answer 9.14 Change the main definition to 

dijkstra :: Graph —)■ Vertex —)■ Path 

dijkstra gv = path {until done {gstep wa) {start n)) 

where path {Is,vs) = {reverse {getPath Is v), distance Is v) 
done {Is, vs) = v ^ vs 
n = length {nodes g) 
wa = weights g 


Answer 9.15 The modified definition is 

gstep :: Weights —)■ State —)■ State 
gstep wa {Is, vs) = {Is',vs') where 

{d,v) = minimum [{distance Is v,v) \ v vi] 
vs' = filter (/ v) V5 

Is' = accum better Is [{u, {v,new u,sum d {wa ! {v,u)))) \ u ■(— v^'] 
where sum dw = iiw == maxint then maxlnt else d + w 
better {vi,ri,di) {v 2 ,r 2 ,d 2 ) = 

ifdi ^ d 2 then {vi,ri,di) else {v 2 ,r 2 ,d 2 ) 
new M = if V == 1 then u else root Is v 
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We turn now to a powerful strategy for solving an optimisation problem when a 
greedy algorithm is not possible. The strategy is ealled thinning, and an algorithm 
that employs it a thinning algorithm. 

The principle at work behind thinning is really quite simple: if maintaining a 
single best candidate at each step is not guaranteed to deliver a best candidate 
overall, then maybe one can get away with maintaining a subset of the candidates. 
Provided we can quickly identify those partial candidates that can never grow into 
fully fledged best candidates, we can remove them from further consideration. The 
key factor in the success of the enterprise is the size of the set that remains. When 
the set of all possible candidates is exponential in the length of the input, we want 
a subset that is much smaller, say one of linear or quadratic size. We have already 
encountered one simple instance of thinning in the computation of extents in the 
TpX problem of Chapter 7. There we thinned an infinite set of candidates into a 
finite one simply in order to make extents a computable function. 

Thinning algorithms therefore sit between the extremes of greedy algorithms 
and exhaustive search algorithms. However, with some exceptions, thinning has 
not traditionally been suggested as a separate design technique in the algorithms 
literature. Instead, problems that are susceptible to thinning have more often been 
solved by a related technique, called dynamic programming, a topic we will pursue 
in Part Five. As we will see, dynamic programming can be thought of as a gener¬ 
alisation of the divide-and-conquer strategy. Although the parentage of thinning 
algorithms and dynamic programming algorithms is different, both techniques can 
often be applied to one and the same problem. What makes thinning important is 
that many algorithms traditionally regarded as paradigms of dynamic programming 
can be formulated, often more effectively, as thinning algorithms. 
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In this chapter we explore the basic theory of thinning and discuss three simple 
examples of the idea. As with most of the algorithms we have seen so far, the 
key step that makes thinning a viable design technique involves fusion. For fusion 
to work, we will need to reason about refinement and another nondeterministic 
function, called ThinBy. The first section enumerates the essential properties that we 
need this function to possess. The chapter ends with a general thinning algorithm 
that captures most of the essential points about how to introduce thinning. 


10.1 Theory 

The theory behind thinning algorithms is all about a nondeterministic function 
ThinBy.: (a^ Bool) —>■ [a] —>■ [a] 

This function takes a comparison function and a list as arguments and returns a list 
as its result. It is specified by two properties: firstly, if ys is a possible output of the 
expression ThinBy (^) that is, if 

ys ^ ThinBy {^) xs 

then ys is a subsequence of xs', and secondly, for every x in xs we can find an element 
y in ys such that y ^x. In symbols, 

ys^xs A Vx G X5 : 3y G y5 : y ^ X 

where y^ C xs means that y^ is a subsequence of xs. It is assumed throughout that ^ 
is a preorder, a relation which is reflexive and transitive. But we do not assume ^ 
is a total preorder; that is, we do not assume that for all x and y either x ^ y or y ^ x 
holds. Any definition of the form 

X ^ y = {cost X ^ cost y) 

for a total function cost :: Ord b y a ^ b would mean that is a total preorder. 
Working with total preorders turns out to be too restrictive for the purposes of 
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thinning, which is why we choose as our basic construct thinning by a comparison 
function rather than thinning with a cost function. 

Here is an example. Suppose ^ is defined on pairs of numbers by 

{a,b) ^ (c,d) = {a ^ c) A {b ^d) 

Then ^ is a preorder, in fact a partial order because it is also anti-symmetric, 
meaning that x ^ y A y ^ x => x = y for all x and y; but ^ is not a total preorder. For 
example, (4,3) and (5,4) are not comparable under Now consider the expression 

r/rmBy(^)[(l,2),(4,3),(2,3),(5,4),(3,l)] 

This expression has four possible refinements: 

[(4,3),(5,4),(3,1)] 

[(4,3),(2,3),(5,4),(3,1)] 

[(1,2),(4,3),(5,4),(3,1)] 

[(1,2),(4,3),(2,3),(5,4),(3,1)] 

The most effective implementation of ThinBy would be to return a subsequence of 
shortest length, but computing such a sequence (see the exercises) can involve a 
quadratic number of evaluations of Instead we prefer sub-optimal implementa¬ 
tions of ThinBy that take linear time. One legitimate but pointless implementation is 
to take thinBy (^) = id. However, the refinement law id t— ThinBy (^) is useful in 
establishing other properties of ThinBy. 

One sensible implementation of ThinBy is to define 

thinBy (^) =foldr bump [] 
where bump x [ ] = [•^] 

bump X (y: ys) 

Jx^y =x:ys 

Jy^x =y.ys 

\ otherwise = x: y: y^ 

This funcfion processes a lisf from righf fo leff. Each new elemenf x can ‘bump’ fhe 
currenf firsf elemenf y if x ^ y, or be bumped by y if y ^ x. Ofherwise if is added fo 
fhe lisf. For example, 

t/imSy(^) [(1,2),(4,3),(2,3),(5,4),(3,1)] = [(1,2),(4,3),(5,4),(3,1)] 

In fhis example, Ihinning is more effeclive if fhe lisf elemenfs are in ascending order 
of firsl componenf, or ascending order of second componenf: 

t/imSy(^) [(1,2), (2,3), (3,1), (4,3), (5,4)] = [(3,1), (4,3), (5,4)] 
f/imSy(^) [(3,1), (1,2), (2,3), (4,3), (5,4)] = [(3,1), (4,3), (5,4)] 

We can maintain order when building candidates in a step-by-step manner by 
merging sublists at each step rather than full-scale sorting. That is the primary 
reason why we insist that thinning a list xs should return a subsequence of xs - the 
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relative order of the elements is not changed. There are other sensible definitions of 
thinBy, including one that processes elements from left to right; see the exercises 
for examples. 

In addition to the identity law, there are six other basic laws about thinning, some 
of which are more useful in calculations than others. Proofs of the laws are relegated 
to the exercises so that we can concentrate here on what they say. The first law is 
that 

ThinBy (^) = ThinBy (^) • ThinBy (^) 

In words, thinning a list twice has the same possible outcomes as thinning it once. 
The law is interesting theoretically but not of much practical use. 

By contrast, the next law is used as the very first step in every derivation that 
follows. It is called thin introduction, and it asserts that 

MinWith cost = MinWith cost ■ ThinBy (^) 

provided x ^ y => cost x ^ cost y. Thin introduction is the law that lets us restate an 
optimisation problem as a problem about thinning. 

The next law is called thin elimination: 

wrap ■ MinWith cost ^ ThinBy (^) 

provided cost x ^ cost y^x^y. Thin elimination is dual to thin introduction, and 
so is its proviso. 

The next law also makes an appearance in virtually every calculation about 
thinning involving concat. It is the distributive law, and it states that 

ThinBy (^) - concat = ThinBy (^) ■ concatMap {ThinBy (^)) 

In words, one can thin the concatenation of a list of lists by thinning each list, 
concatenating the results, and thinning again. Without the final thinning, the law 
would be only a refinement. That is, 

concatMap {ThinBy (^)) ThinBy (^) - concat 

This version is not strong enough to be of much practical help. 

The next law is the thin-map law, which comes in two flavours. Firsfly, 

mapf - ThinBy (^) ThinBy (^) • mapf 
provided x^y =^f x^f y. Secondly, 

ThinBy (^) • mapf ■(— mapf - ThinBy (^) 
provided/X ^/y ^ x ^ y. If follows thaf 
mapf - ThinBy (^) = ThinBy (^) - mapf 

if X ^ y 44/ x^f y. Appeal fo fhe fhin-map law often relies on confexf. For example, 
mapf - ThinBy (^) - filter p = ThinBy (^) • mapf -filter p 
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Figure 10.1 A layered network 


provided px Apy ^ 44/ x y). We will see an example of this context- 

sensitive version in the following section. 

The final law is the thin-filter law: 

ThinBy (^) - filter p = filterp ■ ThinBy (^) 
provided {x ^ y A p y) ^ p x. 

We will come hack to the theory of thinning after first exploring some sample 
problems to see what thinning can contribute to the study of efficient functional 
algorithms. 


10.2 Paths in a layered network 

Our first problem is a shortest-paths problem. Consider the digraph in Figure 10.1. 
Reading from top to bottom, the graph consists of a number of layers, each layer 
consisting of a number of vertices and each edge going from one layer to the one 
beneath. It so happens in the example that there are the same number of vertices 
in each layer, but this is not a requirement. Each edge is given by a triple {u,v,w), 
where u is the source vertex of the edge, v is the target vertex, and w is a numerical 
weight, not necessarily positive. We assume that there is at least one path from some 
vertex in the top layer to some vertex in the bottom layer (in the example there 
are 27 such paths). The problem is to find one with minimum total weight. For the 
example the answer is the path [(4,7,2), (7,11,2), (11,16,3)] of total weight 7. It 
is easy to see that Dijkstra’s algorithm can be used to solve this problem, at least if 
the weights are nonnegative. Imagine another vertex U with zero-weight edges to 
each of the vertices in the top layer, and another vertex V with zero-weight edges 
from each of the vertices in the bottom layer. Then a shortest path from U to V 
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includes a shortest path from the top layer to the bottom layer. Dijkstra’s algorithm 
takes 0{rP') steps, where n is the total number of vertices in the network, but it is 
possible to reduce this time with a thinning algorithm. 

To calculate the thinning algorithm, suppose the layered network is given by a 
list of lists of edges, each list describing the edges between two adjacent layers: 

type Net = [ [Edge ] ] 

type Path = [Edge] 

type Edge = {Vertex, Vertex, Weight) 

type Vertex = Int 

type Weight = Int 

We will make use of the following selector functions: 

source, target:: Edge —)■ Vertex 
source {u,v,w) = u 
target {u,v,w) =v 

weight :: Edge —)■ Weight 
weight {u,v,w) = w 

Our problem is to compute mcp (a minimum-cost path), specified by 

mcp:: Net —)■ Path 

mcp ■(— MinWith cost ■ paths 

The cost function on paths is defined by 

cost :: Path —?■ Int 
cost = sum ■ map weight 

The funcfion paths can be defined in terms of the Cartesian-product function cp: 
cp::[[a]] [[a]] 

cp =foldrop [[]] yihert opxsyss = [x'.ys \ x ^xs,ys ■(—y^^'] 

For example, 

cp [ "abc","de","f"]=["adf","aef","bdf","bef","cdf","cef"] 

We have 

paths:: Net —)■ [Path] 
paths = filter connected ■ cp 

where connected is the predicate 

connected:: Path —)■ Bool 
connected [ ] = True 

connected (e: es) = linked e es A connected es 

and linked is the predicate 
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linked:: Edge —)■ Path —)■ Bool 
linked e\[\ = True 

linked e\ (e 2 '■ es) = target e\ == source e 2 
As a first step we ean fuse filter and cp to arrive at another definition of paths: 
paths =foldr step [ [ ] ] 

where step esps = [e:p\e -^r- es,p ■(— ps, linked ep\ 

Details of the fusion step are left as an exercise. We can also rewrite step in the 
equivalent form 

step es ps = concat [cons e ps \ e ^ es] 

where cons e ps = [e'.p \ p ■‘r- ps, linked e p] 

Now we arrive at the heart of the problem. A greedy algorithm, one that maintains 
a single path at each step, is not possible because the source of a minimum-cost 
path at one level may not be among the target vertices of the edges at the next level 
up. So we introduce thinning. The thin-introduction law says we can rewrite the 
specification as 

mcp ■(— MinWith cost ■ ThinBy (^) • paths 

provided we choose ^ so that pi ^ P 2 ^ cost p\ ^ cost p 2 - An appropriate choice 
for ^ is the partial preorder 

(^) ::Path —)■ Path —)■ Bool 

Pi ^P 2 = source {headpi) == source {headP 2 ) A costpi ^ costp 2 

In words, when building paths from bottom to top, there is no point in keeping a 
path if there is another path with the same source vertex and lower cost. 

The aim now is to fuse ThinBy (^) and paths. That means finding a function 
tstep so that the fusion condition 

tstep es {ThinBy (^) ps) ^ ThinBy (^) {step es ps) 
holds. We can establish the fusion condition by arguing as follows: 

ThinBy (^) {step es ps) 

= { definition of step } 

ThinBy (^) {concat [cons eps\e es]) 

= { distributive law } 

ThinBy (^) {concat [ThinBy (^) {cons eps) [ e ■(— es]) 

= { claim: see below } 

ThinBy (^) {concat [cons e {ThinBy (^) ps) [ e ■(— es]) 

= { definition of step } 

ThinBy (^) {step es {ThinBy (^) ps)) 

—)■ { defining tstep es ps ■(— ThinBy (^) {step es ps) } 

tstep es {ThinBy (^) ps) 
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We have shown that 

foldr tstep [ [ ] ] -^ ThinBy (^) -foldr step [ [ ] ] 
where 

tstep esps t— ThinBy (^) {step esps) 

The claim in the third step is the assertion 

ThinBy (^) {cons e ps) = cons e {ThinBy (^) ps) 

Here is the proof: 

ThinBy (^) • cons e 
= { definition of cons } 

ThinBy (^) • map {e\) -filter {linked e) 

= { thin-map law; see below } 

map {e:) ■ ThinBy (^) - filter {linked e) 

= { thin-filter law; see below } 

map {e\) -filter {linked e) - ThinBy (^) 

= { definition of cons } 

cons e - ThinBy (^) 

The thin-filter law is justified because 

P\ ^ 7^2 A linked ep 2 ^ linked e p\ 

The fhin-map law is justified because 
e\p\ 4 e:p 2 44 pi^Pi 

provided linked ep\ and linked e p 2 - The appeal to the thin-map law in the above 
calculation therefore relies on context. 

In summary, we have the final algorifhm 

mcp = minWith cost -foldr tstep [ [ ] ] 

where tstep es ps = thinBy {^)\e'.p\e ^ es,p ■(— ps, linked e p] 

where minWith is some implemenfation of MinWith and thinBy is some suifable 
implemenfafion of ThinBy. As a furfher opfimisafion we can fuple pafhs wifh fheir 
cosfs fo avoid recompufafion of cost. 

There is one furfher and imporfanf opfimisafion. Thinning will be mosf effective if 
each lisf of edges is sorted so fhaf edges wifh fhe same source verfex appear fogefher. 
Then thinning with the definition of thinBy given in the first section will produce 
just one path for every source vertex. For example, in the network of Figure 10.1 
the first step will produce the four singleton paths 

[[(9,13,4)], [(10,14,2)], [(11,16,3)], [(12,16,7)]] 

Each additional step will also produce exactly four paths because each layer has 
four vertices. As to the running time, observe that, because the number of paths 
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maintained at each step is at most the number of vertices in the current layer, the 
cost of each step is proportional to at most the product of the number of edges 
between two layers and the number of vertices in the lower layer. If each layer has 
no more than k vertices, then the running time is 0{ek) steps, where e is the total 
number of edges. If there are d layers, then e ^ {d—\)k^. Furthermore, the total 
number n of vertices is at most dk. The thinning algorithm therefore takes 0{dk^) 
steps, while Dijkstra’s algorithm takes 0{d^k^) steps. The thinning algorithm is 
therefore superior when the network is deeper than it is wide. By renaming vertices 
so that the vertices in each layer are labelled with 1 to k, and using an array to store 
the best paths at each step, it is possible to shave a factor of k off this running time, 
giving an optimal 0{dk^) algorithm. This extension is left as Exercise 10.14. 


10.3 Coin-changing revisited 

For the next problem we revisit the coin-changing problem of Chapter 7. Recall that 
the greedy algorithm is not guaranteed to produce the smallest number of coins for 
all possible denominations. In particular, the greedy algorithm does not work for the 
United Regions (UR) denominations (see Exercise 7.16). However, the UR is a rich 
country and can afford automated change-giving systems. Which algorithm should 
we design to guarantee a minimum number of coins is given for any possible set of 
denominations? 

One answer is a thinning algorithm. To set things up for a thinning step we need 
to replace the recursive definition of mktuples given in Chapter 7 with a definition 
using an appropriate higher-order function such as a fold of some kind. As we will 
see in Part Five, working directly with recursive definitions leads to thinking about 
dynamic programming solutions, but thinning typically involves a fusion step with 
a higher-order function such as a fold. For compatibility with the other algorithms 
in this chapter we choose/oZJr, so denominations are considered in order from right 
to left. We still want to consider denominations in decreasing order of value, so we 
take currencies in increasing order; for example 

[1,2,5,10,20,50,100,200] 
urds = [1,2,5,15,20,50,100] 

Here are the relevant definitions: 

type Denom = Nat 
type Coin = Nat 
type Residue = Nat 
type Count = Nat 

type Tup/e = {[Coin],Residue^Count) 

And here are the selector functions we will need: 
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coins :: Tuple —)■ [Coin] 
coins (cs,-,-) = cs 

residue :: Tuple —)■ Residue 
residue (_,r,_) = r 

count:: Tuple —)■ Count 
count = k 

This time a tuple consists of three things: a list of coin counts [ck,Ck-i,---,ci] for a 
given list of denominations [d\,d 2 ,---,dk\, the residual amount r after giving these 
coins in change, and a count of the number of coins used. The function mktuples is 
redefined as follows: 

mktuples::Nat ^ [Denom] —)■ [Tuple] 
mktuples n =foldr {concatMap ■ extend) [ ([ ], n,0) ] 

extend:: Denom —)■ Tuple —)■ [Tuple] 

extend d {cs,r,k) = [{cs-{\-[c]^r — c x d,k + c) | c [0. .r div d]] 

We start with no coins and a residue n, the amount of change required. At each 
step the next lower denomination is considered, and every possible choice for a 
number of coins of this denomination is considered. The new residue and count 
are calculated and the algorithm proceeds to the next step. Evaluation of mktuples 
returns many more values than the one in Chapter 7 because it returns all the partial 
tuples, including those with a non-zero residue. For example, 

length {mktuples 256 ukds) = 10640485 
The function mkchange is now specified by 

mkchange ::Nat ^ [Denom] —)■ [Coin] 
mkchange n ^ coins ■ MinWith cost ■ mktuples n 

where 

cost:: Tuple —)■ {Residue, Count) 
cost t = {residue t, count t) 

A candidate with minimum cost is one whose residue is as small as possible and, 
among such candidates, one with minimum count. Since we are assuming there is a 
denomination with value 1, there are candidates with zero residue, so a minimum- 
cost candidate has zero residue and minimum count. 

As in the layered network problem, we now introduce a thinning step, writing 

mkchange n ^ coins ■ MinWith cost ■ ThinBy (^) • mktuples n 
where preorder has to be chosen to satisfy 
h cost t\ ^ cost t 2 

The right choice of ^ is the following one: 
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(^):: Tuple —)■ Tuple —)■ Bool 

h ^h = {residue ti == residue 12 ) A {count t\ ^ count t 2 ) 

In words, there is no point in keeping a tuple in play if there is another tuple whose 
residue is the same but whose count is smaller. That sounds reasonable, but it might 
be thought that a stronger statement is true, namely that there is no point in keeping 
a tuple if there is another tuple whose residue and count are both smaller. However, 
this statement is false (see Exercise 10.16). 

The aim now is to fuse ThinBy (^) and mktuples. For this to work we need to 
verify the fusion condition 

tstep d {ThinBy (^) ts) ^ ThinBy (^) {step d ts) 
for some function tstep satisfying 

tstep d ts ThinBy (^) {step d ts) 

That means we have to verify the condition 

ThinBy (^) {step d {ThinBy (^) ts)) ■(— ThinBy (^) {step d ts) (10.1) 

where step = concatMap ■ extend. Following exactly the same path as in the layered 
network problem, we reason 

ThinBy (^) {step d ts) 

= { definition of step } 

ThinBy (^) {concatMap {extend d) ts) 

= { distributive law } 

ThinBy (^) {concatMap {ThinBy (^) - extend d) ts) 

However, the calculation can proceed no further because 
ThinBy (^) • extend d = extend d 

The reason is that the tuples in extend d t have different residues and thinning can 
never eliminate any tuples. 

Instead, we have to back up and find an alternative proof that (10.1) holds. For 
this we need the key fact that, if ti ^ t 2 , then 

Ve 2 £ extend d t 2 '■ G extend d ti : e\ ^ e 2 (10.2) 

To prove (10.2), let ti = (c5i,r,ki) and t 2 = {cs 2 ,r,k 2 ), where ti ^ t 2 so ki ^ ^ 2 . 
Suppose e 2 = {cs 2 +|- [c],r — c x d,k 2 + c). Then ei = (c^i -+\-[c],r — c x d,ki+c) 
is in extend d ti and e\ ^ e 2 , establishing the result. 

Now to prove (10.1), let us ■(— ThinBy (^) ts and vs ^ ThinBy (^) {step d us). 
We have to show that vs ■(— ThinBy (^) {step d ts), that is, 

V5 C step dts A Vw G step d ts : (3v G V5 : v ^ w) 

Recall that C denotes the subsequence relation. For the first conjunct, we can reason 
as follows: 
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vs 

C { definitions of V5 and ThinBy } 
step d us 

C { since xs^^ys^ step dxs\^ step dys} 
step d ts 

For the second conjunct suppose w € extend d t, where t G ts. Since there exists 
u Gus with u ^t, appeal to (10.2) says there exists e G extend d u with e ^w. But 
hy definition of vs there exists av £vs with v ^ e, so (10.1) follows on appeal to 
the transitivity of 

Summarising, we have shown that 

foldr tstep [([],n,0)] ^ ThinBy (^) • mktuples n 
where 

tstep d ^ ThinBy (^) • concatMap {extend d) 

As with the layered network problem, the thinning step will be more effective if 
tuples with the same residue are brought together. This can be achieved by keeping 
tuples in decreasing order of residue. Since extend produces tuples in this order, it 
is sufficient to define tstep by 

tstep d = thinBy (^) • mergeBy cmp ■ map {extend d) 
where cmp t\ t 2 = residue ti ^ residue t 2 

The definition of mergeBy :: (a —)■ a — Bool) [[a]] ^ [a] is left as an exercise. 
The complete algorithm now reads 

mkchange ::Nat ^ [Denom] —)■ [Coin] 

mkchange n = coins ■ minWith cost -foldr tstep [ ([ ], n, 0) ] 

The running time of mkchange is 0{n^k) steps, where n is the amount for which 
change is required and k is the number of denominations. At each step the number 
of candidates in play is at most n + \ because there is at most one candidate for each 
residual amount r and 0 ^ r ^ n. A candidate with residue r has 0{r) extensions, so 
there can be 0{n^) new candidates before thinning. Processing each denomination 
therefore requires 0{n^) steps, and there are k steps in total. 

As a final remark, the coin-changing problem can be thought of as an instance 
of the layered network problem. Each layer contains one vertex for each residual 
amount and for the denominations considered so far. The edges between the layers 
correspond to the choices for the number of coins for the next denomination. For 
example, with change 17 and denominations [1,2,5,10] the first three layers of the 
network are illustrated in Figure 10.2. The connection between the two problems is 
no accident because all thinning algorithms involving a fold can be regarded as a 
shortest-path problem on a directed acyclic graph of some kind. This connection 
will be examined more closely later when we discuss dynamic programming. 
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Figure 10.2 Coin-changing as a layered network 


10.4 The knapsack problem 

Our third and final problem in this chapter is a famous one called the knapsack prob¬ 
lem. This problem is usually given as a model instance of the dynamic programming 
strategy, but we are going to give a thinning algorithm. The dynamic programming 
solution (see Chapter 13) is more restrictive in that it depends on certain quantities 
being integers. 

Here is the setting. Suppose a thief comes to your room in the night bearing a 
knapsack. Surveying the room, he discovers the following items: 


item 

value 

weight 

value/weight 

Laptop 

30 

14 

2.14 

Television 

67 

31 

2.16 

Jewellery 

19 

8 

2.38 

CD collection 

50 

24 

2.08 


Each item here has an integer value and weight but, in general, values and weights 
can be arbitrary positive real numbers. The thief would like to steal everything in the 
room, but his knapsack can support only a limited weight. Assuming the maximum 
weight the knapsack can hold is 50 units, what items should the thief steal in order 
to maximise the total value of his haul? 

He could decide to pack items in decreasing order of value. That gives 

swag = Television + Laptop (value 97, weight 45) 

He could decide to pack items in ascending order of weight. That gives 
swag = Jewellery + Laptop + CDs (value 99, weight 46) 

He could decide to pack in decreasing value/weight ratio. That gives 
swag = Jewellery + Television (value 86, weight 39) 

Each of these strategies is, of course, a greedy strategy, trying to obtain a global 
optimum by making a sequence of locally optimal decisions. Lor this example the 
best strategy is the second one, but it is easy to give examples to show that packing 
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items in ascending order of weight is not always the best policy. In fact, there is no 
greedy algorithm for the problem. 

The scenario above is known as the 0/1 knapsack problem: either an item is 
chosen or it is not. In the more general integer knapsack problem, the scenario 
changes to a warehouse rather a room. The warehouse contains large numbers of 
each individual item and the thief can choose an arbitrary number of each item 
subject to the capacity of his knapsack. There is no greedy algorithm for this problem 
either. There is, however, one version of the knapsack problem for which a greedy 
algorithm does work. That is the, fractional knapsack problem in which the items 
are things like gold dust in which an arbitrary proportion of each item can be chosen. 
In what follows we will concentrate on the 0/1 version of the knapsack problem, 
leaving the other two as exercises. 

We start off by defining various types and selector functions: 

type Name = String 

type Value = Nat 

type Weight = Nat 

type/tem = {Name, Value, Weight) 

type Selection = {[Name], Value, Weight) 

Each item has a name, a value, and a weight. A selection is a triple consisting of a 
list of item names, the total value of the selection, and its total weight. We will need 
the following three selector functions, the last two of which can be applied both to 
items and to selections, and hence are given a polymorphic type: 

name:: Item —)■ Name 
name (n,= n 

value:: {a, Value, Weight) —)■ Value 
value (_,V,_) = V 

weight:: {a. Value, Weight) —)■ Weight 
weight , w) = w 

We can now specify swag (‘swag’ means money or goods taken by a thief) by 

swag:: Weight —)■ [Item] —)■ Selection 

swag w ■(— MaxWith value -filter {within w) ■ selections 

The nondeterministic function MaxWith cost is dual to MinWith cost in that the 
possible refinements are those with maximum cost rather than minimum cost. The 
first argument to swag is the maximum weight the knapsack can hold. The predicate 
within w is defined by 

within:: Weight —)■ Selection —)■ Bool 
within w sn = weight sn^w 
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There are two reasonable ways to define selections. One way is to write 

selections:: [Item] —)■ [Selection] 
selections =foldr {concatMap ■ extend) [ ([ ], 0,0) ] 
where extend i sn = [sn,add i sn] 

add:: Item —)■ Selection —)■ Selection 

add i {ns, v, w) = {name i: ns, value i + v, weight i + w) 

The other way is left as Exercise 10.21. At each step we can extend a selection 
either by omitting the next item or by including it. The function selections returns 
all possible subsequences of the given list of items, so there are 2” selections for a 
list of n items. 

As a first step, we fas,e. filter with selections to obtain a new function, which we 
will call choices: 

choices:: Weight —)■ [Item] —)■ [Selection] 
choices w =foldr {concatMap ■ extend) [([],0,0)] 

where extend i sn = filter {within w) [sn,add i sn] 

The function choices generates only those selections whose total weight is at most 
the carrying capacity of the knapsack. This step alone can significantly reduce the 
number of selections that have to be considered, but we can do even better with a 
thinning step. We rewrite the specification as 

swag w ■(— MaxWith value ■ ThinBy (^) • choices w 
where the appropriate choice here of preorder ^ is the following one: 

(^):: Selection —)■ Selection —)■ Bool 

sn\ ^ sn 2 = value sni ^ value sn 2 A weight sni ^ weight sn 2 

In words, there is no point in keeping a selection from a list of items in play if 
there is another selection from the same list with a greater value and a smaller 
weight. We have sn^ ^ sn 2 ^ value sn^ ^ value sn 2 , the necessary proviso for the 
thin-introduction step in the case of MaxWith. 

We can now fuse ThinBy and choices to arrive at the new definition 

swag w = maxWith value -foldr tstep [ ([ ], 0,0) ] 

where tstep i = thinBy (^) • concatMap {extend i) 
extend i sn = filter {within w) [sn,add i sn] 

The details are left as Exercise 10.20. As with the other thinning algorithms in this 
chapter, the thinning step will be more effective if the selections are kept in order. 
We can list selections either in decreasing order of value, or in increasing order of 
weight. Since extend produces selections in increasing order of weight, we choose 
the latter. 

The result is the following algorithm for swag: 
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swag w = maxWith value -foldr tstep [ ([ ], 0,0) ] 

where tstep i = thinBy (^) • mergeBy cmp ■ map {extend i) 
extend i sn = filter {within w) [sn,add i sn] 
cmp sni sn 2 = weight sn\ ^ weight sn 2 

This is our final algorithm for the knapsack problem. As to its running time, suppose 
that all weights are integers. Each thinning step brings selections with equal weights 
together and eliminates all but one of them, thereby maintaining a list of at most 
w + I selections, each with a different weight from 0 up to w. This list can be 
computed in 0(w) steps in the worst case. There are n items to process, so the 
running time is 0{nw) steps. That appears to make the algorithm a linear-time one. 
However, if weights are arbitrary positive real numbers, then there is no guarantee 
that only w + 1 selections are maintained at each step. In fact all 2” selections 
might have to be kept, each with a different total weight and value. That means the 
algorithm can be exponential in n for non-integral weights. 
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The last two examples seem very similar (even more so when Exercise 10.20 is 
answered), so let’s end the chapter by solving an abstract problem that captures all 
of the essential ideas behind thinning when candidates is expressed in the following 
way: 

candidates V. [Data] — )■ [Candidate] 
candidates = foldr {concatMap ■ extend) [anon] 

Here anon is some initial candidate. 

Consider a specification of the form 

best" [Data] —)■ Candidate 

best ^ MinWith cost -filter good ■ candidates 

There are four ritual steps in calculating a thinning algorithm to solve this problem. 
The first step is to fuse filter good with candidates. This step is possible if 

good {extend dx) ^ good x 

In other words, if a candidate is bad, then no extension of the candidate can ever be 
good. The candidate anon has to be good, otherwise there are no good candidates. 
We can now reason 

filter good ■ concatMap {extend d) 

= { since filter p ■ concatMap = concatMap (filter p) } 

concatMap (filter good ■ extend d) 

= { assumption } 

concatMap (filter good ■ extend d) - filter good 
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This establishes the fusion eondition and so 

filter good ■ foldr (concatMap ■ extend) \anon] =foldr step \anon] 
where 

step d = concatMap (filter good ■ extend d) 

The second step is to introduce thinning. Suppose ^ is a comparison function for 
which X ^ ^ cost x ^ cost y for all good candidates x and y. We can then appeal 
to the thin-introduction law to refine the specification of best to read 

best ^ MinWith cost ■ ThinBy (^) -foldr step [anon] 

The third step is to fuse ThinBy (^) wA foldr. With 

tstep d ^ ThinBy (^) • step d 

we have 

foldr tstep [anon] ^ ThinBy (^) -foldr step [anon] 

provided the fusion condition 

tstep d - ThinBy (^) -^ ThinBy (^) • step d 

holds. With the specification of tstep above, the proviso follows from 

ThinBy (^) • step d - ThinBy (^) -^ ThinBy (^) • step d 

which, as we have seen in (10.1) of Section 10.3, follows from the assumption 

X ^y Vv G goodext dy.^u ^ goodext dx'.u^v 

where goodext d x = filter good (extend d x). As a result we have 

best = minWith cost-foldr step [anon] 

where step d = thinBy (^) • concatMap (filter good - extend d) 

The fourth and final sfep is fo make fhinning more effeclive by keeping fhe candidates 
in order. Suppose value is some function on candidates such thaf extend produces 
new candidates in, say, increasing order of value. Then we have as fhe final algorifhm 

best = minWith cost-foldr step [anon] 

where step d = thinBy (^) • mergeBy cmp - map (filter good - extend d) 
cmp xy = value x ^ value y 

If is possible, wifh more or less effort, to reformulate the three problems in this 
chapter as instances of this general scheme, but the reformulation does not add 
significantly to the understanding of the three algorithms. What is important is that 
the derivation of a thinning algorithm follows a more or less standard path. 
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10.6 Chapter notes 

The theory of thinning algorithms was descrihed in [2], and developed further 
hy Sharon Curtis and Shin-Cheng Mu in their doctoral theses [3, 7]. The general 
thinning theorem, along with a number of applications, appeared in [4] and [5]. The 
knapsack problem has a long history, see [6], and a version of the thinning method 
applied to this problem appears in [1]. 


References 

[1] Joachim H. Ahrens and Gerd Finke. Merging and sorting applied to the zero-one 
knapsack problem. Operations Research, 23{6yA099-ll09, 1975. 

[2] Richard S. Bird and Oege de Moor. The Algebra of Programming. Prentice-Hall, 
Hemel Hempstead, 1997. 

[3] Sharon Curtis. A Relational Approach to Optimization Problems. DPhil thesis, 
Oxford University Computing Laboratory, 1996. Technical Monograph PRG-122. 

[4] Oege de Moor. A generic program for sequential decision processes. In Programming 
Languages: Implementations, Logics and Programs, volume 982 of Lecture Notes in 
Computer Science, pages 1-23, Springer-Verlag, Berlin, 1995. 

[5] Oege de Moor. Dynamic programming as a software component. In Circuits, Systems, 
Computers and Communications, IEEE, 1999. Invited talk. 

[6] Silvano Martello and Paolo Toth. Knapsack Problems: Algorithms and Computer 
Implementations. John Wiley and Sons, Chichester, 1990. 

[7] Shin-Cheng Mu. A Calculational Approach to Program Inversion. DPhil thesis, 
Oxford University Computing Laboratory, 2003. Research Report PRG-RR-04-03. 


Exercises 

Exercise 10.1 Is ThinBy (^) [] well-defined? 

Exercise 10.2 Give a linear-time algorithm for thinBy (=^) that processes the list 
from left to right. 

Exercise 10.3 Here is a specification of a version of thinBy that computes shortest 
thinnings: 

thinBy (^) ■(— MinWith length {candidates {^) xs) 

Give a definition of candidates. You can assume a function subseqs:: [a] [[a]] 

that returns all the subsequences of a sequence. 

Exercise 10.4 Following on from the previous exercise, give a quadratic-time algo¬ 
rithm for thinBy. No justification is required. 

Exercise 10.5 Is the refinement law id P- ThinBy (^) valid for all possible defini¬ 
tions of 
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Exercise 10.6 Give an implementation thinBy of ThinBy for whieh the equation 
thinBy (^) = thinBy (^) - thinBy (^) 
is false. 

Exercise 10.7 The idempoteney of ThinBy is eaptured as two refinements: 

ThinBy (^) <— ThinBy (^) • ThinBy (^) 

ThinBy (^) • ThinBy (^) ■(— ThinBy (^) 

The first refinement is easy. Why? For the second we have to show for all xs that, 
if y5 ^ ThinBy {^) xs and zs ThinBy {^)ys, then zs ^ ThinBy {^) xs. Why is 
this assertion true? 

Exercise 10.8 The thin-introduction law is also captured as two refinements: 

MinWith cost <r- MinWith cost ■ ThinBy (^) 

MinWith cost ■ ThinBy (^) MinWith cost 

The first refinement is easy. Why? Prove that the second refinement holds under the 
proviso X ^ y cost x ^ cost y. 

Exercise 10.9 Prove the thin-elimination law, namely that 
wrap ■ MinWith cost ■(— ThinBy (^) 
provided cost x ^ cost y ^ x ^ y. 

Exercise 10.10 Prove the thin-filter law, namely that 
filter p ■ ThinBy (^) = ThinBy (^) - filter p 
provided x^yApy^px. 

Exercise 10.11 Prove the thin-map law, namely that 
map f - ThinBy (^) -^ ThinBy (^) • map f 
provided x ^y x y. 

Exercise 10.12 Give an example to show that the law 
ThinBy (^) • concat = concatMap {ThinBy (^)) 
does not hold. 

Exercise 10.13 For the layered network problem, prove that 
filter connected -foldr op [ [ ] ] =foldr step [ [ ] ] 
where 

op es ps =\e'.p\e ^ es,p ^ ps] 

step es ps = [e :p \ e -A- es,p A- ps, linked e p\ 
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Exercise 10.14 Suppose in the layered network problem that there exists some k 
such that the vertices in each layer are labelled with integers in the range 1 to k. This 
can always be achieved by renaming the vertices, so there is no loss of generality in 
the assumption. Then optimal paths from each vertex y can be stored as thejth ele¬ 
ment of an array indexed from 1 to k. That idea leads to the following version of mcp\ 

mcp:: Nat —)■ Net —)■ Path 

mcp k = snd ■ minWithfst ■ elems-foldr step start 

where start = array (l,k) [(v, (0, [])) | v ^ [1 • -^l] 

step es pa = accuniArray better initial (1, k) {map insert es) 
y/hert initial = ... 

insert {u,v,w) = ... 

better (ci,pi) (c 2 ,P 2 ) = if ci ^ C 2 then (ci,pi) else (c 2 ,P 2 ) 
Complete the algorithm by giving the definitions of initial and insert. 

Exercise 10.15 Give a definition of mergeBy:: (a ^ a^ Bool) —)•[[«]]—)• [a]. 
Exercise 10.16 In the coin-changing problem, suppose we had defined ^by 
t\ ^h = {residue t\ ^ residue t 2 ) A {count t\ ^ count t 2 ) 

Now consider the list of tuples generated when the amount is 13 and the denom¬ 
inations are [l,x,5,10]. Both ([1,0],3,1) and ([0,1],8,1) are on this list after 
processing the denominations [5,10]. Thinning with ^ eliminates the latter. Why is 
this wrong? (Hint: choose x.) 

Exercise 10.17 The function extend in the coin-changing problem was defined by 
extend d {cs,r,k) = [(c^-H- [c],r — c x <i,k-|-c) | c t— [0.. r div d]] 

How should mkchange be redefined if fhe ferm c^-H- [c] is replaced by fhe more 
efficienl c: cs7 

Exercise 10.18 The final algorifhm for fhe coin-changing problem was 
mkchange n = coins ■ minWith cost -foldr tstep [ ([ ], n, 0) ] 

Whaf addifional simplificalion is possible? 

Exercise 10.19 In fhe definilion of choices in fhe knapsack problem we defined fhe 
local function 

extend i sn = filter {within w) [sn,add i sn] 

Whaf minor opfimisafion fo fhis definilion is possible? 

Exercise 10.20 To fuse ThinBy (^) and choices in fhe knapsack problem, we have 
fo verify fhe fusion condifion 

ThinBy (^) {step i {ThinBy (^) sns)) ^ ThinBy (^) {step i sns) 
where step i = concatMap ■ extend i. Prove fhaf fhis condifion holds. 
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Exercise 10.21 There is another way of defining the funetion selections, namely 

selections[Item] —)■ [Selection] 
selections =foldr step [ ([ ], 0,0) ] 

where step i sns = sns Tf map {add i) sns 

Write down the final thinning algorithm for this version of selections. 

Exercise 10.22 The knapsack problem was specified using MaxWith value. But 
presumably the thief would prefer a selection that not only had the maximum value, 
but also had minimum weight. How would you modify the specification to take this 
aspect into account? 

Exercise 10.23 In the integer knapsack problem, selections have the type 
type Selection = ([ {Nat,Name) ], Value, Weight) 

Along with each item name there is a count of the number of times the item is 
chosen. We can specify swag by 

swag:: Weight —)■ [Item] —)■ Selection 
swag w <— extract ■ MaxWith value ■ choices w 

where extract retains only the chosen items: 

extract: : Selection —)■ Selection 

extract {kns,v,w) = (filter nonzero kns,v,w) 

where nonzero {k,n) =k fiO 

Define choices. Hence write down a thinning algorithm for the integer knapsack 
problem. 

Exercise 10.24 Why is it impossible to write down an executable specification for 
the fractional knapsack problem? It is, however, possible to show that a greedy 
algorithm works. The method is to consider items in decreasing order of value- 
to-weight ratio. At each step, the whole of the next item is chosen if the weight 
constraint is satisfied, otherwise the maximum possible proportion of the next item 
is chosen and the algorithm terminates. That way, up to the whole of the capacity of 
the knapsack can be used. If all weights are integers, then it is possible to express 
the greedy algorithm using rational arithmetic. Selections therefore have type 

type Selection = {[{Rational,Name)], Value, Weight) 
type Value = Rational 
type Weight = Rational 

Write down the definition of a function gswag that has the type 
gswag:: Weight —)■ [Item] —)■ [{Rational,Name)] 

Hint: one answer is to sort the values in increasing order of value-to-weight ratio 
and then to process from right to left. 
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Answers 

Answer 10.1 Yes, we have [] = ThinBy (^) []. 

Answer 10.2 One possibility is 

thinBy (=^) =foldl bump [] • reverse 
where bump []x =[x] 

bump (y:X I V ^ y =x:ys 

|y^x =y:ys 

I otherwise = x:y:ys 

Another method, which may not give the same answer, is to define 

thinBy {^)[] = [] 
thinBy (^) [x] = [x] 
thinBy (^) {x'.y.xs) 

I X ^y = thinBy (^) {x:xs) 

I y ^ X = thinBy (^) (y: xs) 

I otherwise = x: thinBy (^) {y:xs) 

Answer 10.3 The straightforward definition of candidates is 

candidates (^) xs = [y^ | y^ ■(— subseqsxs,ok xs ys] 
where ok xs ys = and [or [y ^ x | y •(— y^] | x ■(— x^] 

The prelude functions and and or return the conjunction and disjunction of a list of 
Booleans. 

Answer 10.4 One definition is 
thinBy (=^) =foldr gstep [] 

where gstep x y^ = if any x) y^ then y^ else x '.filter {not • (x ^)) y^ 

where the prelude function any p is defined by any p = or- map p. This computes a 
shortest thinning; but the proof that it does so is rather involved, and is omitted. 

Answer 10.5 No, it holds only if ^ is reflexive. 

Answer 10.6 The following definition of thinBy removes at most one element: 

thinBy {^)[\ = [] 

thinBy (^) [x] = [x] 

thinBy {^) (x:y :x5) = if x ^ y thenx :x5 else x: t/imBy (^) (yix^) 

For example, 

thinBy (^) [1,2,3] = [1,3] 
thinBy (^) [2,1,3] = [2,1] 

Thinning twice can remove two elements, so thinBy is not idempotent. 
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Answer 10.7 The first refinement follows from the identity law and the monotonic¬ 
ity of functional composition under refinement. The second refinement follows from 
the transitivity of ^ and transitivity of C, the subsequence relation. 

Answer 10.8 The first follows from the identity law. For the second we have to 
show that ys ^ ThinBy (^) xs and y ^ MinWith cost ys, then y ■(— MinWith cost xs. 
This fact follows easily from the proviso. 

Answer 10.9 We have to show that 

X ■(— MinWith cost xs [x] ■(— ThinBy (^) xs 
This comes down to cost x ^ cost y ^ x ^ y for all y G xs, which is just the proviso. 
Answer 10.10 We have to show that 

ys ■(— ThinBy {^) xs /\zs = filter p ys 
^ zs ^ ThinBy (^) (filterp xs) 
zs •(— ThinBy (^) (filterpxs) 

(3y5 : ys ■(— ThinBy (^) xs A zs = filter p ys) 

For the first implication we have C y^ C xs, since C is transitive. Furthermore, it 
follows from y^ ■(— ThinBy (^) X5 and the proviso that, if x G X5 and p x, then there 
exists a y G y5 such that X and p y. Hence y G zs, and so zs is a valid refinement 
of ThinBy (^) (filter p x^). 

For the second implication take y^ to be the subsequence of xs consisting of zs 
together with all the elements of xs not satisfying p. Thus zs = filter p ys. Suppose 
X G XS', either p x holds, in which case there exists a z G such that z ^ x, or p x 
does not hold, in which case x G y^ and x ^ x. Hence y^ is a valid refinement of 
ThinBy (^) X5. 

Answer 10.11 Suppose y^ ■(— ThinBy (^) X5. We have to show that 
mapfys A- ThinBy (^) (mapfxs) 

This follows from the proviso and the fact that 
yi C X5 mapf ys C mapf xs 
The other thin-map law 

ThinBy (^) • mapf ^ mapf ■ ThinBy (=^) 
is also straightforward. 

Answer 10.12 Let x ^ y = (x ^ y). Then 
[1] ThinBy (^) (concat [[1], [2]]) 
but 


concat [ThinBy (^) [1],ThinBy (^) [2]] = [1,2] 



Answers 


263 


Answer 10.13 We have to show 

filter connected {op es ps) = step es (filter connected ps) 

The proof is 

filter connected (op es ps) 

= { definition of op } 

[e:p\ e <r- es,p ■(— ps,connected (e'.p)] 

= { definition of connected } 

[e'.p\e ^ es,p ^filter connected ps, linked e p\ 

= { definition of step } 

step es {filter connected ps) 

Answer 10.14 We ean define mcp by 
mcp :: Nat —)■ Net —)■ Path 

mcp k = snd ■ minWithfst ■ elems-foldr step start 

where start = array (1,^) [(v, (0, [])) 1 v ^ [1 • -^J] 

step es pa = accumArray better initial (1, ^) (map insert es) 
where initial = (maxlnt, [ ]) 

insert {u,v,w) = (u, (add w c, {u,v,w):p)) 
where {c,p) =pa\v 

better {ci,pi) {c 2 ,P 2 ) = if ci ^ C 2 then (ci,pi) else {c 2 ,P 2 ) 

The value maxlnt and fhe funefion add are as defined in Dijksfra’s algorifhm: 

maxlnt:: Int 
maxlnt = maxBound 

add w c = ifc == maxlnt then maxlnt else w + c 

Each call of step lakes 0{k + e), where e is fhe number of edges in fhe currenl layer, 
so fhe fofal running time is 0{dkfi as there are 0{dk^) edges in total. 

Answer 10.15 One definition is 

mergeBy:: (a^ a^ Bool) —)•[[«]] ^ [a] 
mergeBy cmp = foldr merge [] 
where merge [ ] =xs 
merge [ ] =ys 
merge (x: xs) (y: ys) 

I cmp xy =x: merge xs (y: y^) 

I otherwise = y: merge (x: xs) ys 

Answer 10.16 Because if x = 4 the tuple ([0,1,2], 0,3) would not be produced and 
the minimum-cost solution would not be found. 
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Answer 10.17 Simply replace coins by reverse ■ coins. 

Answer 10.18 Since tstep produces answers in decreasing order of residue, we can 
write 

mkchange n = coins ■ last -foldr tstep [ ([ ], n, 0) ] 

Answer 10.19 Since only selections that satisfy the capacity constraint are main¬ 
tained, we could have defined 

extend i sn = sn '.filter {within w) [add i sn] 

Other definitions are of course possible. 

Answer 10.20 A direct attack fails because ThinBy {^)-extend i = extend i. Instead 
we have to show that (10.2) holds, namely that, if sni ^ sn 2 , then 

\/en 2 G extend i sn 2 : 3eni G extend i sn\ '. en\ ^ en 2 

This comes down to the fact that add i iui is a valid choice if add i sn 2 is, and that 

sni ^ sn 2 add i sn\ ^ add i sn 2 

Answer 10.21 The definition is 

swag w = maxWith value -foldr tstep [ ([ ], 0,0) ] 

where tstep i sns = thinBy (^) {mergeBy cmp \sns,sns'\) 

where sns' = filter {within w) {map {add i) sns) 
cmp sni sn 2 = weight sni ^ weight sn 2 

Answer 10.22 One solution is to introduce a cost function 

costSelection —)■ {Value, Weight) 
cost sn = {value sn,negate {weight sn)) 

and replace maxWith value by maxWith cost. Another solution would be to replace 
maxWith value by maxBy (^), where 

sni ^ sn 2 = value sni < value sn 2 V 

{value sni == value sn 2 A weight sni ^ weight sn 2 ) 

That would involve a new maximisation function 

maxBy :: (a —)■ a —)■ Bool) -^\a\^ a 
maxBy (^) = foldr 1 higher 

where higherxy = ifx ^y then y else v 
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Answer 10.23 The definition is 

choices:: Weight ^ [Item] —)■ [Selection] 
choices w =foldr {concatMap ■ choose) [([], 0,0) ] 

where choose i sn = [add ki sn\ k [0..max]] 

where max = (w — weight sn) div weight i 

add ::Nat —)■ Item —)■ Selection —)■ Selection 

add k i {kns,v,w) = {{k, name i): kns, k x value i + v,kx weight i + w) 

The function add k selects k copies of the next item, where k is constrained so that 
the knapsack capacity is not exceeded. Note that values of k are chosen in increasing 
order, so the weights of selections are in increasing order. The thinning algorithm is 
then defined by 

swag w = extract ■ maxWith value -foldr tstep [ ([ ], 0,0) ] 

where tstep i = thinBy (^) • mergeBy cmp ■ map {choose i) 
choose i sn = [add ki sn\ k [0..max]] 

where max = {w — weight sn) div weight i 
cmp sn\ sn 2 = weight sn\ ^ weight sn 2 

Answer 10.24 In the fractional knapsack problem there is an infinite, in fact an 
uncountably infinite, number of choices for each item, one for each real number x in 
the range 0 ^ x ^ 1. So no executable specification is possible. When the weights 
are integers, each choice is a rational number r in the range 0 ^ r ^ 1, reducing the 
number of choices to a countably infinite number. The greedy algorithm is 

gswag:: Weight ^ [Item] —)■ [{Rational^Name)] 
gswag w = extract -foldr {add w) ([ ], 0,0) • sortBy cmp 

extract:: Selection —)■ [ {Rational, Name) ] 
extract {rns,-,-) = reverse rns 

add:: Weight —)■ Item —)■ Selection —)■ Selection 
add w i {ms,vn,wn) = if wn == w then {ms,vn,wn) 

else ((r, name i): rns, vn + r x vi, wn + rx wi) 
where r = min 1 {{w — wn)/wi) 
wi = fromintegral {weight i) 
vi = fromintegral {value i) 
cmp:: Item —)■ Item —)■ Ordering 

cmp ii i 2 = compare {value i\ x weight h) {value h x weight i\) 

For example, gswag 50 items returns the answer 

[(1 % 1, "Jewellery"), (1 % 1, "TV"), (11 % 14, "Laptop")] 

Quite how the thief steals eleven-fourteenths of a laptop we leave to your imagina¬ 
tion. 




Chapter 11 


Segments and subsequences 


By definition, a segment of a list is a contiguous subsequence of the list. Thus "arb" 
is a segment of "barbara" while "bab" is a subsequence but not a contiguous one. 
A segment that begins a list is called a prefix or an initial segment, and one that ends 
a list a suffix or tail segment. Segments are also csAleA factors or substrings in the 
literature, but we will reserve the word ‘segment’ for the contiguous subsequences. 
A list can have an exponential number of subsequences but only a quadratic number 
of segments. 

Problems involving segments and subsequences abound in computing. For ex¬ 
ample, they arise in genomics, text processing, data mining, and data compression. 
Whole books have been written on ‘stringology’ and many interesting, subtle, and 
useful algorithms have been discussed and analysed over the years. In this chapter 
we confine our attention to three simply stated problems, one involving segments 
and two involving subsequences. The segment problem is the most complicated of 
the three, so we will begin with the two problems about subsequences. 


11.1 The longest upsequence 

Given a sequence of elements from an ordered type, the function lus computes some 
longest subsequence whose elements are in strictly increasing order (in other words, 
a longest upsequence): 

lus :: Ord a ^ [a] —)• [a] 

lus ^ MaxWith length -filter up ■ subseqs 

For example, "lost" is a longest upsequence of "longest". The test up can be 
defined by 

up:: Ord a ^ [a] ^ Bool 

up xs = and {zipWith (<) {tail x^)) 
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The function subseqs can be defined in a number of ways (see the exercises); here 
are two, both based on foldr. The first is to write 

subseqs:: [a] —)■ [[a]] 
subseqs = foldr step [ [ ] ] 

where step x xss = xss 4f map (x:) xss 

The second way is to write 

subseqs:: [a] —t [[a]] 
subseqs = foldr {concatMap ■ extend) [ [ ] ] 
where extend xxs = [xs^x: xs] 

The second method is essentially the one used in the definition of selections in the 
knapsack problem of the previous chapter. For the sake of variety we will adopt 
the first definition. In either case, straightforward implementation of lus leads to an 
algorithm with exponential time simply because there are an exponential number of 
subsequences that have to be checked. Our aim is to do better; in fact there is an 
0{n log n) time algorithm for the problem. 

The first step in the standard recipe is to fa&o filter up and subseqs to arrive at 

lus t— MaxWith length -foldr step [ [ ] ] 

where step x xss = xss Tf map (x:) {filter {ok x) xss) 
okxys = null ysV x< head ys 

Only upsequences are kept at each step. An element x can be added to the front of 
an upsequence ys if either ys is the empty sequence or its first element is greater 
than X. 

The next step is to see whether a greedy algorithm is possible. Can we keep 
a single longest upsequence at each step? No, because while "ab" is the unique 
longest upsequence of "xab", the longest upsequence of "uvwxab" is "uvwx", so 
"x" cannot disappear from view and we need to keep more than one upsequence 
in play. A similar argument holds if the input is processed from left to right, so 
the failure is not due to use of foldr. The same argument shows that no obvious 
divide-and-conquer algorithm is possible either: we can split the input into two 
and compute a longest upsequence for each half, but these two upsequences do not 
provide sufficient information to determine a longest upsequence of the whole input. 
That all means we need to keep more than one candidate in play. So we introduce a 
thinning step: 

lus t— MaxWith length ■ ThinBy (^) - foldr step [ [ ] ] 

We have to ensure 

xs ^ys length xs length ys 

for the thin-introduction step to be valid, but what else do we need? Well, when 
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building upsequences from right to left, one upsequence is clearly better than another 
if it is no shorter and its first element, if it exists, is bigger. For instance, "jot" is 
a better upsequence to keep in play than " dot" because the former allows more 
symbols to be prefixed while maintaining an upsequence (" i j ot" is an upsequence 
but " idot" is not). We also want to keep the empty sequence as a candidate. That 
all suggests defining ^by 

[] ^ [] =True 

(t: X5) [ ] = False 

[ ] ^ (l' • 3^'^) = False 

{x'.xs) ^ (y: y^) = x ^ y A length xs ^ length ys 

The first and fourth clauses ensure that ^ is reflexive and the length condition holds. 

The next step is to fuse ThinBy (^) andfoldr step [[]]. To this end we reason as 
follows: 

ThinBy (^) {stepxxss) 

= { definition of step } 

ThinBy (^) {xss Frniap (x:) (filter (okx) 

= { distributive law of ThinBy } 

ThinBy (^) (ThinBy (^) xssFr ThinBy (^) (map (x:) (filter (okx) x^^))) 

—{ thin-map law (see below) } 

ThinBy (^) (ThinBy (^) xss-\Fmap (x:) (ThinBy (^) (filter (okx) x^^))) 

= { thin-filter law (see below) } 

ThinBy (^) (ThinBy (^) xss-\Fmap (x:) (filter (okx) (ThinBy (^) x^^))) 
The thin-map and thin-filter laws rely on the facts that 

xs ^ys x:xs ^x:ys 

xs ^ ys A ok X ys okx xs 

whose proofs are left as exercises. Hence, defining tstep by 
tstep X xss = thinBy (^) (step x xss) 
we have 

foldr tstep [ [ ] ] -^ ThinBy (^) -foldr step [ [ ] ] 

and so lus ■(— MaxWith length -foldr tstep [ [ ] ]. Finally, the thinning process can be 
made more effective by keeping subsequences in increasing order of length. That 
all leads quite quickly to 

lus = last -foldr tstep [ [ ] ] 

tstep X xss = thinBy (^) (mergeBy cmp [x55,y55]) 

where yss = map (x:) (filter (ok x) xss) 
cmp xs ys = length xs ^ length ys 

Ignoring length calculations, this version of lus takes 0(nr) steps, where n is the 
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length of the input and r is the length of the longest upsequence. At most r + 1 
upsequences are kept in play at each stage and these can he updated in 0(r) steps. 

To discover the path for further optimisation we need to look more closely at the 
computation. Observe that at a typical stage a list of upsequences [xso,xsi , ...,xsk] is 
maintained in which xsj has length j, and headxsj > headxsj^\ for 1 <k. After 

applying tstep x to this list, we obtain a new list 

[xso,...,xsj,x:xsj,xsj+ 2 ,---,xsk] 

where head xsj > x ^ head xsj+\ (assuming the heads of xsq and xsk+i are infinitely 
large and infinitely small, respectively). For example, since 

foldr tstep [[]] "ripper" = "r", "pr", "ipr"] 

we obtain 

foldr tstep [[]] "kripper" = "r", "pr", "kpr"] 

foldr tstep [[]] "cripper" = "r", "pr", "ipr", "cipr"] 

foldr tstep [[]] "tripper" = ["", "t", "pr", "ipr"] 

That means tstep can be redefined fo read 

tstep X ([ ]: xss) = [ ]: search x [ ] xss 
where search x X5 [ ] =[x'.xs] 

search x xs (y^: x^^) 

I head ys>x = ys'. search x ys xss 
I otherwise = (x: xs): xss 

This version of tstep finds the required insertion point by linear search from left 
to right: the first ys such that head ^ x is replaced by x: xs, where xs is the 
upsequence immediately preceding y^. If there is no such y^, then x: X5 is added 
to the end of the list, as in the "cripper" example above. Length calculations no 
longer appear and the running time of lus is 0{nr) steps. In an imperative setting, 
the running time can be improved to 0{n log r) steps by using an array and binary 
search to locate the required insertion point. However, since the array also has to 
be updated at each step, and array updates take linear time in a purely functional 
setting, this solution does not improve the running time. The alternative is to use a 
balanced binary search tree, but we will leave the details to Exercise 11.6. 


11.2 The longest common subsequence 

The problem of finding the longest common subsequence of two sequences has 
many applications in computing, basically because such a subsequence is a useful 
measure of how similar the two sequences are. In this section we consider a function 



11.2 The longest common subsequence 


271 


SO that Ics xs ys returns a longest common subsequence of xs and ys. The problem 
is interesting because of the number of different ways to solve it. We begin with a 
specification of the problem, in fact with two specifications. The first is to define 

Ics xs ys ^ MaxWith length {intersect {subseqs xs) {subseqs ys)) 

where intersect returns the common elements of two lists. The second specification, 
and the one we will use, is to define 

ics xs ■(— MaxWith length-filter {sub xs) ■ subseqs 

where the test sub xs ys determines whether ys is a subsequence of xs\ 

sub xs [ ] = True 

sub [ ] (y: ys) = False 

sub {x : xs) {y: ys) = if x == y then sub xs ys else sub xs (y: y^) 

The first specification maintains symmetry between xs and y^, while the second 
breaks it. The advantage of the second specification is simply that it places us in 
familiar territory for ferreting out a thinning algorithm. 

For a functional programmer happy with recursion as their basic tool there is a 
simple way to solve the problem, which is to write 

lcs[] ys =[] 

Ics xs [ ] = [ ] 

Ics {x: xs) (y: y^) = if x == y then x: Ics xs ys 

else longer {Ics (x: xs) ys) {Ics xs (y: y^)) 

The function longer returns the longer of two lists. This solution is an attractive one 
because there are no subseqs, filter, or intersect operations, and it can be justified 
by starting with the symmetric specification of Ics and considering the various cases 
that can arise. However, this solution takes exponential time, and the reason it does 
so is because it involves computing the solutions to the same subproblems many 
times over. The way to solve this problem is by dynamic programming, so we will 
return to this solution in the next part of the book. 

For a mathematician there is another way to solve the problem. Mathematicians 
like to reduce problems with unknown solutions to problems with known solutions. 
We can do that here. After the previous section we know that the function lus for 
computing a longest upsequence can be solved reasonably efficiently and, with a 
little bit of cleverness, we can compute Ics in terms of lus. The solution takes the 
form 

Ics xs ys = decode {lus {encode xs yx)) ys 

We encode xs and ys as a single list over an ordered alphabet, solve the longest 
upsequence problem on this encoded list, and then decode the result. Here is how 
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we encode the two sequences. Suppose ys is the list 
0 1 2 3 4 5 

'b' 'a' 'a' 'b' 'c' 'a' 

The positions of the elements are recorded above the elements. Let xs be the string 
"baxca". For each letter in xs we record the positions in ys at which this letter 
occurs, but in reverse order. Thus 

posns 'b' = [3,0], posns 'a' =[5,2,1], posns 'x' = [], 
posns 'c' = [4], posns 'a' = [5,2,1] 

The encoded string is the concatenation [3,0,5,2,1,4,5,2,1 ] of these positions. The 
longest upsequence of this list is [0,1,4,5], which decodes to "baca", the longest 
common subsequence of xs and ys. We leave it as an exercise to show why the trick 
works, and also to supply the definitions of encode and decode. In the worst case 
the encoded string can have length 0(n^) when both inputs have length n, so the 
computation of lus can take 0(n^ log n) steps. A thinning approach can bring this 
worst case time down to 0(n^) steps. 

The first step in the standard recipe is to fuse filter {sub xs) and subseqs. The 
success of this step relies on the fact that sub xs {y : ys) ^ sub xs ys. In words, if 
y: yx is a subsequence of xs, then so is ys. Here is the result of the fusion step: 

Ics xs ^ MaxWith length -foldr step [ [ ] ] 

where step y yss = y^^ -{]-filter {sub xs) {map (y:) y^i) 

Instead of filtering at the end, we can filter at each step. 

The next step is to check whether a greedy algorithm is possible. The longest 
common subsequence of "abc" and "cab" is "ab", which cannot be extended 
leftwards to the longest common subsequence of "abc" and "abcab", namely 
"abc". So we cannot maintain a single subsequence at each step, and have to 
introduce thinning. In order to determine which subsequences to keep we need to 
know the position of each subsequence in xs. A subsequence can occur more than 
once in a sequence; for example "ba" appears four times in "baabca", in positions 
[0,1], [0,2], [0,5], [3,5]. When building subsequences from right to left, that is 
adding elements to the front of subsequences, we want the position that is lexically 
the largest, namely [3,5] in the above example. Such a choice gives the greatest 
freedom in adding common elements to the front of the sequence. We do not need the 
full position in calculations, but only the position of the first element. So the position 
of "ba" we want is 3, the rightmost position at which the last occurrence of "ba" 
in "baabca" starts. The rightmost position of the empty sequence in "baabca" is 
6. We will set the position of a sequence ys in a sequence xs to be —1 if y^ is not 
a subsequence of xs. The definition of position is left as an exercise. 

A subsequence can be discarded if there is another subsequence whose length 
and position are at least as large. Hence for fixed xs we can define 
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ys ^ zs = length ys ^ length zs A position xs ys ^ position xs zs 
The thin-introduction law is now applicable, giving that Ics xs is a refinement of 
MaxWith length ■ ThinBy (^) -foldr step [ [ ] ] 

The next step is to fuse ThinBy with/oZt/r. We can keep subsequences in increasing 
order of position, and therefore in decreasing order of length, by merging at each 
step. Moreover, the t&mt filter {sub xs) can be removed from the computation if all 
sequences with negative positions are discarded. That leads to 

Ics xs = head -foldr tstep [ [ ] ] 

where tstep yyss = thinBy (^) {mergeBy cmp 

where zss = dropWhile negpos {map (y:) yss) 

negpos ys = position xx < 0 
ys ^ zs = length ys ^ length zs A 

position xs ys ^ position xs zs 
cmp ys zs = position xs ys ^ position xs zs 
The final opfimisafion is fo avoid mulfiple compufafions of position and length. To 
fhis end we represenf a subsequence us of xs by a quadruple {p,k, ws, us) in which 

p = position xs us 

k = length us 

ws = reverse {take p xs) 

For example, the representation of "ba" as a subsequence of "baabca" is the 
quadruple (3,2, "aab", "ba"). The function consx which replaces (x:) is defined 
by 

consX {p,k, wi, us) = {p — l — length as,k+l, tail bs,x: us) 
where {as,bs) = span (/ x) ws 

For example, 

cons 'b' (3,2, "aab", "ba") = (0,3,""bba") 
cons 'x' (3,2, "aab", "ba") = ( —1,3,_L, "bba") 

\fx\us is nol a subsequence of xs, Ihen Ihe firsl componenf is negafive and Ihe third 
is undefined. Now we can define 

Ics xs = ext ■ head -foldr tstep start 

where start = [ {length X5,0, reverse X5, [ ]) ] 

tstep yyss = thinBy (^) {mergeBy cmp [yx5,z55]) 

where zss = dropWhile negpos {map {cons y) y^^) 

negpos ys = psn y^ < 0 

^1 a! ^2 = psn qi ^ psn q 2 A Ing q[ ^ Ing q 2 

cmp qi q 2 = psn q\ ^ psn q 2 
where ext, psn, and Ing are fhe selector funclions 
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ext {p,k,ws,us) = us 
psn {p,k,ws,us) = p 
Ing {p,k,ws,us) = k 

This algorithm takes 0{mn) steps, where m and n are the lengths of xs and ys. 


11.3 A short segment with maximum sum 

Our third problem is easy to state but not so easy to solve, at least not with an 
efficient algorithm. Given a list of positive and negative integers, the problem is 
simply to return a segment of the list that has the largest possible sum subject to the 
segment not being too long. Thus we want to compute mss, where 

mss::Nat ^ [Integer] —)■ [Integer] 

mss b t— MaxWith sum -filter {short b) ■ segments 

and short is defined by 

short::Nat ^ [a] Bool 
short bxs = {length xs ^ b) 

For example, 

mss 3 [1, —2,3,0, —5,3, —2,3, —1] = [3, —2,3] 

The function segments is defined below. Straightforward computation of mss takes 
0{bn) steps. There are &{bn) short segments in a list of length n, and we can gen¬ 
erate all of them, along with their sums, in this time. Finding one with a maximum 
sum takes linear time, so the algorithm takes 0{bn) steps. However, b may be quite 
large, and 0{n) is a much better bound on the algorithm. The aim of this section is 
to describe an algorithm with such a bound. The algorithm is interesting because a 
significant change of representation is required to achieve the desired efficiency, but 
it is still basically a thinning algorithm. 

First of all, here is one definition of segments: 

segments:: [a] —)■ [[a]] 
segments = concatMap inits ■ tails 

The segments of the list are therefore obtained by taking all the prefixes of all the 
suffixes. As we will see, this leads to an algorithm that processes the input from 
right to left. We could also have chosen to take all suffixes of all prefixes, in which 
case the algorithm proceeds from left to right. The functions inits and tails were 
discussed in Chapter 2 and are provided in the library Data.List. Both functions 
include the empty list as a prefix or suffix, so the empty list appears n + \ times in 
the segments of a list of length n. There are easy modifications to the definitions of 
inits and tails that produce only nonempty segments. However, allowing the empty 
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segment as a eandidate means that a short segment with maximum sum in a list of 
negative numbers is the empty sequenee. 

We ean now reason 

MaxWith sum -filter {short b) ■ segments 
= { definition of segments } 

MaxWith sum -filter {short b) - concatMap inits - tails 
= { since filter p - concat = concat - map (filter p) } 

MaxWith sum - concatMap (filter (short b) - inits) - tails 
= { distrihutive law } 

MaxWith sum - map (MaxWith sum -filter (short b) - inits) - tails 
—7- { with msp b t— MaxWith sum -filter (short b) - inits } 

MaxWith sum - map (msp b) - tails 

Summarising this ealeulation, we have shown that 

mss b t— MaxWith sum - map (msp b) - tails 
msp b t— MaxWith sum -filter (short b) - inits 

The new funetion msp eomputes a short prefix wifh maximum sum. For example, 

[-2,4,4,-5,8,-2,3,l] = [-2,4,4] 

[-2,4,4,-5,8,-2,3,l] = [-2,4,4,-5,8] 

The new form of mss suggesfs an appeal fo fhe Scan Lemma, an essenfial fool when 
dealing with problems involving segments. The Scan Lemma was mentioned in 
Answer 1.12, but here it is again: 

map (foldr op e) - tails = scanr op e 

Applied to a list of length n, the left-hand side requires &(n^) applications of op, 
while the right-hand side requires only &(n) applications. The function scanr is a 
Haskell function in the library Data.List, whose definition is basically as follows: 

scanr:: (a ^ b ^ b) ^ b ^ \a\^ [b\ 
scanr op e [] = [^] 

scanr op e (x: xs) = op x (head ys): ys where ys = scanr op e xs 
For example, 

scanr (0) e [x,y] = [x® (y ® e),y ® e,e] 

Later on, we will need the companion function scant 

scanl:: (b ^ a ^ b) ^ b ^ [a]^ [b] 

scanl op e[] = [^] 

scanl op e (x:xs) = e: scanl op (op e x) xs 

For example, 

scanl (0) e [x,y] = [e,e(Bx, (e0x) 0y] 
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The Scan Lemma suggests we look for a definition of msp as an instance offoldr. 
Then we would obtain a definition of mss in terms of scanr. More precisely, if we 
can find a definition of msp in the form 

msp b = foldr {op b) [] 
then we can refine mss to read 

mss b t— MaxWith sum ■ scanr {op b) [ ] 

As it happens there is such a definition of msp, but it doesn’t help: 
msp b = foldr {op b) [ ] where opbxxs = msp b {x: xs) 

This identity cannot serve as a legitimate Haskell definition of msp because it is 
circular. In effect it states no more nor less than that msp b {x: xs) is a prefix of 
X: msp b xs. Exercise 11.13 asks for a proof of this assertion. 

Instead we will follow the standard thinning recipe. The first step is to fuse 
filter {short b) with inits, thereby producing only short prefixes. The function inits 
can be expressed in terms of foldr: 

inits:: [a] —)■ [[a]] 

inits = foldr step [ [ ] ] where step x xss = [ ]: map {x:) xss 

Since the elements of xss are lists in increasing order of length from 0 up to k, where 
k is the length of xss, we have 

filter {short b) {step x xss) = if length {last xss) == b 

then [] :map {x:) {initxss) 
else [\:map {x:) xss 

In words, if adding a new element to the front of the list increases its length beyond 
b, then we can simply cut out the last list. An appeal to the fusion law of foldr then 
leads to 

msp b t— MaxWith sum -foldr {op b) [ [ ] ] 
where 

op b X xss = []: map {x:) {cut bxss) 

cut b xss = if length {last xss) == b then init xss else xss 

Later on we will see how to make the computation of cut more efficient. 

The next step is to introduce thinning, refining msp to read 

msp b t— MaxWith sum ■ ThinBy (^) - foldr {op b) [ [ ] ] 

An appropriate choice of preorder is 

xs ^ys = {sum xs ^ sum ys) A {length xs ^ length ys) 

In words, there is no point in keeping a prefix if there is another prefix that is shorter 
and whose sum is at least as large. For example, optimal thinning of 
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foldr {opl) [[]][-2,A A-5,^ -2,3,9] 
produces the prefixes 

[], [-2,4], [-2,4,4], [-2,4,4,-5,8], [-2,4,4,-5,8,-2,3] 

of length at most 7 with sums 0,2,6,9,10. These prefixes are in increasing order of 
length as well as increasing order of sum. 

The next step, another appeal to fusion, is to thin at each step rather than just 
once at the end. Thinning can be implemented by taking advantage of the fact that 
the prefixes are in strictly increasing order of length and in strictly increasing order 
of sum. This means we only have to delete a nonempty prefix if its sum is less than 
or equal to zero. That gives 

msp b = last -foldr {op b) [ [ ] ] 

op b X xss = [ ]: thin {map {x:) {cut b xss)) 

thin = dropWhile {Xxs. sum xs ^ 0) 

In other words, we cut from the end of the list to keep the prefixes short, and thin 
from the front of the list to keep sums positive. The prefix with the largest sum is 
the last prefix in the sequence. Now we can define 

mss b = maxWith sum ■ map last ■ scanr {op b) [ [ ] ] 

However, this definition of op will not suffice for the final algorithm. Even ignoring 
the cost of cutting and thinning, the map operations mean that computation of op 
on a list of length k takes 0{k) steps. In the worst case, when the input is a list of 
positive numbers, we have k = b, so the total running time of mss is 0{bn) steps 
for an input of length n. That’s no better than before. The way to achieve a bound of 
0{n) steps is by changing the representation of the list of prefixes. 

The idea is simple enough: represent the list of prefixes by their differences. For 
example, instead of maintaining the list 

[[], [-2,4], [-2,4,4], [-2,4,4,-5,8], [-2,4,4,-5,8,-2,3]] 

we maintain the partition [[—2,4], [4], [—5,8], [—2,3]] of the last element. More 
precisely, suppose we define the abstraction function 

abst:: [[a]] —)■ [[a]] 
abst = scanl (+!-)[] 

Then 

abst[[-2,A],[A],[-5,%],[-2,3]] 

= [[],[-2,4],[-2,4,4],[-2,4,4,-5,8],[-2,4,4,-5,8,-2,3]] 

In particular, last ■ abst = concat. To effect the change in representation we need a 
function, opR say, so that 

abst {opR b x xss) = op bx {abst xss) 
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Then, by the fusion law of foldr, we have 
abst -foldr (opR b) [ ] = foldr [op b) [ [ ] ] 

sinee abst [ ] = [ [ ] ]. Note that we seek to apply the law in the anti-fusion or fission 
direetion, splitting the fold on the right into two funetions. To define opR we need 
the funetion 

cutR b xss = if length {concat xss) == b then init xss else xss 
as a replaeement for cut. The funetion cutR satisfies 
cut b {abst xss) = abst {cutR b xss) 

We also need a replacement for thin, which we will call thinR. This function will 
satisfy 

[]: thin {map {x:) {abstxss)) = abst {thinRxxss) 

We can now define opR by 

opR b X xss = thinR x {cutR b xss) 

Here is the proof that this choice works: 

abst {opR b x xss) 

= { definition of opR } 

abst {thinR x {cutR b xss)) 

= { above property of thinR } 

[]: thin {map (v:) {abst {cutR b xss))) 

= { above property of cutR } 

[] :thin {map (x:) {cut b {abst xss))) 

= { definition of op } 

opb X {abst xss) 

Now, putting everything together, we have 
mss b 

= { definition of mss in terms of msp } 

maxWith sum ■ map {msp b) ■ tails 
= { definition of msp in terms of foldr } 

maxWith sum ■ map {last -foldr {op b) [ [ ] ]) 

= { definition of opR } 

maxWith sum - map {last - abst -foldr {opR b) [ ]) • tails 
= { Scan Lemma } 

maxWith sum - map {last - abst) - scanR {opR b) [ ] 

= { since last - abst = concat } 

maxWith sum - map concat - scanR {opR b) [ ] 


Hence 
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mss b = maxWith sum ■ map concat ■ scanr (opR b) [ ] 

It remains to give the definition of thinR: 

thinR X xss = add [x] xss 
where add xs xss 

I sumxs>0 = xs:xss 
I null xss = [] 

I otherwise = add {xs +|- head xss) {tail xss) 

For example, 

add[-5] [[-2,3],[6],[-l,4]] =add[-5,-2,3] [[6],[-1,4]] 

= add [-5,-2,3,6] [[-1,4]] 

= [[-5,-2,3,6],[-1,4]] 

If the current segment has positive sum, then it is added to the front of the list 
of segments; otherwise it is concatenated with the next segment and the process 
is repeated. If no segment has positive sum, then the empty list is returned. The 
function add is similar to the function collapse we considered in Section 1.5 and 
indeed was the inspiration for collapse. 

The final step is to ensure that all the length, concat, init, sum, and 4f opera¬ 
tions are implemented efficiently. Firstly, we tuple partitions and segments with 
their sums and lengths. Secondly, since partitions are processed at both ends, we 
need symmetric lists (see Chapter 3) to ensure that init and cons operations take 
constant time. Finally, to make segment concatenation efficient, we introduce an 
accumulating function. Here are the relevant definitions: 

type Partition = {Sum, Length, SymList Segment) 
type Segment = {Sum,Length, [Integer] —)■ [Integer]) 
type Sum = Integer 
type Length = Nat 

We use the functions sumP, lenP, and segsP to extract the components of a partition, 
and sumS, lenS, and segS to extract the components of a segment. The function opR 
is replaced by opP, defined by 

opP b X xss = thinP x {cutP b xss) 

where cutP is defined by 

cutP :: Length —)■ Partition —)■ Partition 

cutP b xss = if lenP xss ==b then initP xss else xss 

initP "Partition —)■ Partition 

initP {s,k,xss) = {s — t,k — m,initSLxss) where {t ,/«,_) = lastSLxss 
and thinP is defined by 
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thinP "Integer —)■ Partition —)■ Partition 
thinPxxss = add {x, 1, ([xj+l-)) xss 

add :: Segment —)■ Partition —)■ Partition 
add xs xss \ sumS > 0 = consP xs xss 

I lenP xss == 0 = emptyP 

I otherwise = add {catSxs {headPxss)) {tailPxss) 

The subsidiary functions are defined by 

consP:: Segment —)■ Partition —)■ Partition 

consP xs ( 5 , k,xss) = {sumS xs + s, lenS xs + k, consSL xs xss) 

emptyP :: Partition 
emptyP = {0,0,nilSL) 

headP :: Partition —)■ Segment 
headP xss = headSL {segsP xss) 
tailP :: Partition —)■ Partition 

tailP {s,k,xss) = {s — t,k — m, tailSLxss) where (f,m, _) = headSLxss 

cats :: Segment —)■ Segment —)■ Segment 
cats {s,k,f) {t,m,g) = {s+ t,k+ m,f ■ g) 

The final definition of mss is now given by 

mss b = extract ■ maxWith sumP ■ scanr {opP b) emptyP 

extract V. Partition —)■ [Integer] 

extract = concatMap (flip segS [ ]) -fromSL ■ segsP 

We have flip segS [ ] = segS [ ], so the accumulating function of a segment is 

applied to the empty list at the very end of the computation, and the results are 
concatenated to produce the final answer. 

It remains to time the program. With the exception of add, all the other functions 
appearing in opP take constant time. The function add takes an additional number 
of steps proportional to the number of segments deleted. But the total number of 
segments deleted cannot exceed the total number added, which is at most n for an 
input of length n. Thus add takes amortised constant time. Computing extract can 
take 0(b) steps, so the total time for computing a short segment with maximum 
sum is 0(n + b) = 0(n) steps, as we promised at the outset. 


11.4 Chapter notes 

There are a number of books on stringology, including [1, 2, 6]. All three of these 
texts discuss the longest common subsequence problem and other related problems, 
such as the edit-distance problem and the problem of optimal alignment. Gusfield [6] 
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describes the reduction of the longest common subsequence problem to the longest 
upsequence problem used in this cbapter. The upsequence problem is a favourite 
example in formal program design for showing the use of loop invariants, and is 
treated in [5, 4] as well as in a number of other places. 

The maximum-sum short segment problem was discussed in [7]. Other problems 
about finding segments with various properties are described in [3] and [8]. 
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Exercises 

Exercise 11.1 Precisely bow many segments and subsequences are there of a list 
of n distinct elements? How many segments are there of length at most bl 

Exercise 11.2 Write down a definition of subseqs that produces subsequences in 
ascending order of length. No length calculations are allowed. 

Exercise 11.3 With the definitions of ^ and ok given in the longest upsequence 
problem, we claimed 

xs ^ys ^ x'.xs^x'.ys 

xs ^ ys A ok xys ^ okx xs 
Prove these claims. 

Exercise 11.4 Can the definition of ^ in the longest upsequence problem be re¬ 
placed by 

xs ^ys = length xs ^ length ys A xs^ ys 


or not? 
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Exercise 11.5 Suppose we defined an upsequence to be one whose elements are 
only weakly increasing. Thus we change up to read 

up xs = and {zipWith (^) xs {tailxs)) 

Write down a definition of tstep for which Iwus = last -foldr tstep [ [ ] ]. 

Exercise 11.6 As mentioned in the text, the longest upsequence problem can be 
solved in 0{n log r) steps by using a balanced binary search tree. The aim of the 
following three exercises is to construct such a solution. The material depends on 
Section 4.3 and Section 4.4, so reread those sections first. Recall the definition 

data Tree a = Null \ Node Int {Tree a) a {Tree a) 

from Section 4.3. Alistx55 = [xsQ,xsi,...,xsk\ of upsequences is represented by a 
tree t of type Tree [a] such that flatten t = xss. The leftmost value x^o is the empty 
sequence. As a warm-up exercise, define the function rmost that returns the last 
entry xsk- 

Exercise 11.7 Following on, the new definition of lus takes the form 

lusw Ord a ^ [a] —)■ [a] 
lus = rmost -foldr update {Node 1 Null [ ] Null) 
where update xt = modify x {split x t) 

The value of split x t is a pair of trees, the first of which is a tree whose labels consist 
of the empty list and lists y: xs for which y > x, and the second is a tree whose labels 
are lists y.xs for which y ^ x. This function is defined exactly as in Section 4.4: 

split:: Ord ay^ a ^ Tree [a] {Tree [a],Tree [a]) 
split xt = sew {pieces xt[\) 

However, the definition of pieces is different. This time we have 
pieces:: Ord ay^ a^ Tree [a] —)• [Piece [a] ] —)• [Piece [a] ] 
where, as in Section 4.4, we have 

data Piece a = LP {Tree a) a [ RP a {Tree a) 

Recall that a left piece LP lx is missing its right subtree, and a right piece RP x r is 
missing its left subtree. The definition of pieces x t ps is different because the labels 
of t, apart from the leftmost label [ ], are in decreasing rather than increasing order. 
Give the modified definition of pieces. 

Exercise 11.8 The definition of sew is the same as in Section 4.4, so it remains to 
define modify x {t\ , t 2 ). If t 2 is not Null, then modify returns a tree that results from 
combining t\ and a modified tree obtained from t 2 by replacing the leftmost label of 
t 2 with X: xs, where xs is the rightmost label of t\. If t 2 is Null, then a new node with 
label X: X5 is created. As a final task, define modify in terms of the function combine 
from Section 4.4. 
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Exercise 11.9 Write down the definitions of encode and decode for whieh 
Ics xs ys = decode {lus {encode xs 3 ^ 5 )) ys 

Show that each upsequence of encode xs ys corresponds to a common subsequence 
of xs and ys with the same length. 

Exercise 11.10 One way of defining the function position is by using a helper 
function: 

position xs ys = help {length xs) {reverse xs) {reverse ys) 

Define help, making sure that the result is negative if ys is not a subsequence of xs. 

Exercise 11.11 Recall that for a given xs the preorder ^ for the longest common 
subsequence problem was defined by 

ys ^ zs = length ys ^ length zs A position xs ys ^ position xs zs 
Show that 

ys ^ zs A sub xs zs ^ sub xs ys 
ys ^ zs y.ys^y.zs 

Hence justify the refinement 

tstepy {ThinBy (^) t— ThinBy (^) {stepyyss) 

where tstep y yss t— ThinBy (^) {step y yss). 

Exercise 11.12 Express tails as an instance of scanr and inits as an instance of 
scanl. 

Exercise 11.13 Show that msp b {x : xs) is a prefix of x: msp b xs. 

Exercise 11.14 A similar but much simpler problem about segments is to find a 
segment with maximum sum with no length restrictions: 

mss A- MaxWith sum ■ segments 
Write down a definition of msp for which 
mss A- MaxWith sum ■ map msp ■ tails 

Find a function step for which msp =foldr step [], and hence construct a simple 
linear-time algorithm for mss. 


Answers 

Answer 11.1 Every element can be included or excluded in a subsequence, giving 
2” subsequences in total. The number of nonempty segments of lengthy is n —j + 1, 
so the total number of nonempty segments is 
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£(« -7 + 1) = £7 = «(« + 1)/2 
7=1 7=1 

The number of nonempty segments of length at most b is 

Y^{n-j+\) = Y^{n-j)=bn-Y^j = bn-b{b-\)/2 
7=1 7=0 7=0 

Answer 11.2 Perhaps the simplest method is to maintain a list of lists of subse¬ 
quences, the first list being all the subsequences of length 0, the second list all the 
subsequences of length 1, and so on. This list can be updated as each new element is 
processed, and then can be concatenated at the end of the computation. Thus we have 

subseqs = concat -foldr op [[[]]] 

where 

op-:.a^[[[a]]] [[[a]]] 

op X {xss : xsss) = xss : step x xss xsss 

step X xss [ ] = [map (x:) 

step X xss (y55: ysss) = {map (x: ) xss -H- : step x yss ysss 

Answer 11.3 We have 

x:xs ^ x:ys 
4= { definition of ^ } 

length xs ^ length 
4= { definition of ^ } 

xs =4 ys 

The second claim is immediate if both xs and y^ are the empty sequence. Otherwise 
we can argue 

u:us ^ v:vs A okX {v:vs) 

{ definition of ^ and ok } 

M ^ V Ax<v 

{ definition of ok } 
okx {u: us) 

Answer 11.4 No. We have X5 ^ [] for all xs, so the empty list would be removed 
by any thinning step. 

Answer 11.5 The only change is to replace > by 

tstep X {xs : x^i) = xs : search xs x xss 
where search X5 x [ ] =[x:xs] 

search xs x {ys : xss) \ head ys ^ x = ys: search ys x xss 
I otherwise = (x: x^): xss 
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Answer 11.6 The definition of rmost is 

rmost:: Tree \a] \a] 

rmost {Node ^ I xs Null) = xs 
rmost {Node ^ I xs r) = rmost r 

Answer 11,7 The definition of pieces is 

pieces x Null ps = ps 
pieces x {Node _ Ixs r) ps 

I null xsV {x< head xs) = pieces x r {LP I xs : ps) 

I otherwise = pieces x I {RP xs r: ps) 

Answer 11.8 The definition of modify is 

modify:: a —)• {Tree [a],Tree [a]) —)• Tree [a] 
modify x{t\f 2 ) = combine t\ {replace {x: rmost ti) t 2 ) 

replace:: [a] Tree [a] —)■ Tree [a] 
replace xs Null = Node 1 Null xs Null 

replace xs {Node h Null ys r) = Node h Null xs r 
replace xs {Node hlys r) = Node h {replace xs 1) ys r 

Answer 11.9 Here are possible definitions: 

encode xs ys = concatMap {posns ys) xs 
posnsysx = reverse [i \ {i,y) ^ zip[0..]ys,y ==x] 

decode us ys = pick us {zip [ 0 .. ] 3 ^ 5 ) 

where 

pick[]pys =[] 

pick {u: us) {{p,y):pys) = if u ==p then y:pick uspys 

else pick {u: us) pys 

Each upsequence of encode xs ys obviously decodes to a subsequence of ys of the 
same length since any list of increasing positions in ys corresponds to a subsequence 
of ys. Each upsequence also corresponds to a subsequence of xs, as we can see by 
defining 

decodei usxsys = pick us [{posnsysx,x) \ x^ xs] 

where 

pick [ ] psxs = [ ] 

pick {u: us) {{ps,x):psxs) = if u Gps then x:pick us psxs 

else pick {u: us) psxs 

Then decodei {lus {encode xs y^)) xs ys decodes to a subsequence of xs. 
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Answer 11.10 The definition is 

help p xs[\ =p 

help p[\ys = — 1 

help p {x: xs) {y: ys) 

\x==y = help {p — \) xs ys 

I otherwise = help {p — \)xs(y:ys) 

Answer 11.11 For the first eondition it is suffieient to observe that 
position xs ys ^ position xs zs 

implies that ys is a subsequenee of xs if zs is. 

For the seeond eondition we ean prove that 

position xs ys ^ position xs zs ^ position xs (y: ys) ^ position xs (j: zs) 
by ease analysis: either position xs {y \ zs) = — \, in whieh ease the result is immedi¬ 
ate, or position xs (y: zs) ^ 0, in which case both y : zs and y : ys are subsequences 
of xs and the position of y: is at least as large as the position of y: zs. 

For the last part, we argue 

ThinBy (^) (stepyyss) 

= { definition of step } 

ThinBy (^) {yss-{]-filter (subxs) {map (y:) y^'.s')) 

= { distributive law of ThinBy } 

ThinBy (^) {ThinBy (^) y^^-H- 

ThinBy (^) {filter {subxs) {map (y:) y^^))) 

—{ thin-filter law } 

ThinBy (^) {ThinBy (^) y^^-H- 

filter {subxs) {ThinBy (^) {map (y:) y^^))) 

= { thin-map law } 

ThinBy (^) {ThinBy (^) y^^-H- 

filter {subxs) {map (y:) {ThinBy (^) y^^))) 

—)■ { given tstep y yss ■(— Thinby (^) {step y y^^) } 

tstep y {ThinBy (^) y^i) 

Answer 11.12 We have 

tails = scanr {Xxxs. [x] -H-xi) [] 
inits = scanl {Xxsx.xs-\]-[x]) [] 

Answer 11.13 Suppose msp b xs = ys and suppose to the contrary that x: y^ is a 
proper prefix of msp b (x: xs). That means 

msp b (x: xs) = x:yszs 
for some nonempty sequence zs. But 
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sum ( 3^5 -H- = sum ys + sum zs ^ sum ys 

by definition of msp b xs, and 

X + sum ys<x + sum ys + sum zs 

by definition of msp b {x:xs), so sum zs is both positive and negative, giving rise to 
a eontradietion. 

Answer 11.14 We have 

msp ■(— MaxWith sum ■ inits 

We ean find a greedy algorithm for msp, maintaining a prefix with maximum sum at 
eaeh step. Tupling sum eomputations, we then have 

msp = snd -foldr step (0, [ ]) 

step x{s,xs) = if x + x > 0 then {x + s,x'.xs) else (0, []) 

And now 

mss = snd ■ maxWithfst ■ scanr step (0, [ ]) 
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By definition, a partition of a nonempty list is a division of the list into nonempty 
segments. For example, ["par", "tit", "i", "on"] is one partition of the string 
"partition". Partitions arise in a variety of problems. For instance, the segment 
problem of the previous chapter involved partitioning the prefixes of a list to achieve 
efficiency. In one version of Mergesort the input is partitioned into runs of non¬ 
decreasing elements before merging. In operations research, the scheduling of a 
sequence of activities can often be specified in terms of partitioning the activities. 
Partitions also arise in various data-compression and text-processing algorithms. In 
this chapter we will confine ourselves to just two examples. The first is a simple 
scheduling problem, while the second involves breaking paragraphs into lines. 


12.1 Ways of generating partitions 

First of all, let us look at some of the ways we can generate all the partitions of a 
list. A partition of a list of type [A] has type [ [A] ], so a list of partitions has type 
[ [ [A] ] ]. To improve readability, we introduce the type synonyms 

type Partition a = [Segment a] 
type Segment a =[a] 

A list of partitions now has the more readable type [Partition a]. By definition, xss 
is a partition of xs just in the case that 

concat xss = xs A all [not ■ null) xss 

In particular, the empty list is the only partition of the empty list. The following 
recursive definition of parts can be derived from the specification above: 

parts'.: [a] —)■ [Partition a] 
parts [] = [[]] 

parts xs = [y5:y55 | (y5,z5) t— splits xs,yss t— parts zs] 
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Each partition is generated by taking a nonempty prefix of the input list as the first 
segment, and then following it with a partition of the remaining suffix. The funcfion 
splits splifs a nonempfy lisf xs info a pair of lisfs (y5,z5) such fhaf ys is nonempfy 
and y5+|-z5 = xs: 

splits:: [a] —)• [([a], [a])] 
splits [ ] = [ ] 

splits {x:xs) = ([x],x5): [(x:y5,z5) | (y5,z5) ^ splits xs] 

There are ofher ways of defining parts, including inductive definitions based on 
eifher/oWr orfoldl. One definition of parts in ferms of foldr is 

parts:: [a] —)■ [Partition a] 

parts = foldr {concatMap ■ extendi) [ [ ] ] 

where extendi exfends a parfifion on fhe leff: 

extendi :: n —s- Partition a —)■ [Partition a] 
extendi x[\ = [cons x [ ] ] 
extendi xp = [cons xp, glue xp] 

cons,glue::a —)■ Partition a —)■ Partition a 
consxp =[x]:p 
glue x{s:p) = {x:s):p 

The fwo ways of exfending a nonempfy parfifion wifh a new elemenf on fhe leff are 
fo sfarf a new segmenf, or fo ‘glue’ fhe elemenf onto fhe firsf segmenf, provided 
such a segmenf exisfs. 

The corresponding definifion of parts in ferms of foldl is 

parts:: [a] —)■ [Partition a] 

parts = foldl (flip (concatMap ■ extendr )) [ [ ] ] 

where, fhis fime, extendr exfends a parfifion on fhe righf: 

extendr:: a —)■ Partition a —)■ [Partition a] 
extendrX [] = [snocx []] 
extendr xp = [snoc xp,bind xp] 

snoc,bind:: a —Partition a —)■ Partition a 

snocxp =P+I- [[.x]] 

bind xp = init p TT [last p TT [x] ] 

The functions snoc and bind are fhe dual varianfs of cons and glue (bind has fhe 
merif of being pronounceable, while eulg is nof). Of course, snoc and bind do nol 
lake conslanl lime, bul we can deal wifh lhal problem as and when fhe need arises. 

If seems like a free choice as to whelher to use a definition of parts in terms of 
foldr or foldl, bul for some problems fhe righf choice is imporlanl. Many problems 
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about partitions ask for a partition in which all of its component segments satisfy 
some property, ok say. Consider the task of proving that 

filter {all ok) ■ parts =foldr {concatMap ■ okextendl) [ [ ] ] 
where the definition of okextendl - the ok left-extensions - is 
okextendl x = filter {ok ■ head) ■ extendi x 
The context-sensitive fusion condition is that 

filter {all ok) {concatMap {extendi x) ps) = 
concatMap {okextendl x) {filter {all ok) ps) 

for all partitions ps of the same list. To prove it, one needs the assumption that ok 
is suffix-closed, meaning that, if ok (x^-H-yi) holds, then so does okys. Details are 
left as an exercise. Dually, if we start out with the definition of parts in terms of 
foldl, then the required assumption is that ok is prefix-closed, meaning that ok xs 
holds if ok (xi-H-y^) does. Many predicates, including those used in the following 
sections, are both prefix-closed and suffix-closed, so there is a free choice of which 
definition of parts to adopt. But sometimes only one of these properties holds, and 
that dictates the choice of definition for parts. 


12.2 Managing two bank accounts 

Our first problem is a simple example of a scheduling problem. It can be introduced 
in the following way. A certain individual, whom we will call Zakia, has two 
online hank accounts, a current account and a savings account. Zakia uses the 
current account only for a fixed and known sequence of transactions (deposits and 
withdrawals), such as salary, standing orders, and utility bills. For security reasons, 
Zakia never wants more than a certain amount C in her current account, where C 
is some fixed amount assumed to be at least as large as any single transaction. To 
maintain this security condition, Zakia wants to set up an automatic sequence of 
transfers between her current and deposit accounts so that at the beginning of each 
group of transactions money can be transferred into or out of the current account 
to cope with the next group of transactions. To minimise traffic, Zakia wants the 
number of such transfers to be as small as possible. 

Abstractly stated, the problem is to find a shortest partition of a list of positive 
and negative integers into a list of safe segments. A segment [xi,X 2 ,...,Xk\ is safe if 
there is an amount r, the residue in the current account at the beginning of such a 
sequence, such that all of the sums 

r, r-fxi, r-|-xi-|-X2, ..., r-f xi-f X2H - hXjt 

lie between 0 and the given bound C. For example, if C = 100, the sequence 
[-20,40,60,-30] is safe because we can take r = 20. But [40,-50,10,80,20] 
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is not safe because r has to be at least 10 to cope with the first withdrawal and 
10 + 40 — 50 + 10 + 80 + 20 = 110, which is greater than C. It is left as an exercise 
to show that, if a segment is safe, then so is every prefix and suffix of the segment. 

To simplify the safety condition, let m and n be the maximum and minimum of 

the sums 0,xi,xi +V 2 , +X 2 H-hx^:, so n ^ 0 ^ m. Then it is required that 

there exists an r such that 0 ^ r + n ^ C and 0 ^ r + m ^ C. These two conditions 
are equivalent to m^C + n (see the exercises). Hence, supposing C is provided as 
a global value c, we can define 

safe :: Segment Int —Bool 
safe xs = maximum sums ^ c + minimum sums 
where sums = scanl (+) 0x5 

The function msp (a minimum safe partition) can now be specified by 

msp:: [Int] —)■ Partition Int 
msp t— MinWith length -filter (all safe) ■ parts 
The function msp returns a partition, not the sequence of transfers that have to be 
made between the two accounts. We will leave it as an exercise to show how the 
transfers can be computed from the final partition. 

The first step in the standard recipe is to fuse the filter operation with the gener¬ 
ation of partitions. Since safe is both prefix-closed and suffix-closed, we can use 
either definition of parts. Choosing the definition of parts in terms of foldr, we 
obtain 

msp t— MinWith length ■ safeParts 
where safeParts is defined by 

safeParts = foldr (concatMap ■ safeExtendl) [ [ ] ] 
safeExtendl x = filter [safe ■ head) ■ extendi x 
At each step only safe partitions are computed. It is assumed that every singleton 
transaction is safe, so a new transaction can always start a new segment. But it can 
only be glued to a segment if the result is safe, which means that the segment itself 
is also safe. 

The next step in the recipe is to introduce thinning. Before doing so, we should 
first check whether or not a greedy algorithm is possible. Consider, for example, the 
transactions [4,4,3, —3,5]. Taking C = 10, there are two safe partitions of shortest 
length, namely [[4], [4,3, —3,5]] and [[4,4], [3, —3,5]]. While the former can be 
extended to a safe partition [[5,4], [4,3, —3,5]] of length 2 by gluing 5, the second 
one cannot, because [5,4,4] is not a safe segment. It follows that we cannot get 
away with maintaining an arbitrary shortest safe partition. But that leaves open the 
possibility of a greedy algorithm with a modified cost function 

costp = {lengthp,length {headp)) 
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In words, we may be able to maintain a shortest partition whose first segment is 
also as short as possible. Sueh a definition would be perfectly acceptable because 
minimising cost also minimises length. Recalling the standard calculation for a 
greedy algorithm, we can reason 

MinWith cost ■ concatMap {safeExtendl x) 

= { distributing MinWith cost } 

MinWith cost ■ map [MinWith cost ■ safeExtendl x) 

—5- { with add x t— MinWith cost ■ safeExtendl x } 

MinWith cost ■ map [add x) 

—)■ { greedy condition (see below) } 

add X ■ MinWith cost 

The definition of add can be simplified to read 
addx [] = [[-t:]] 

add x{s'.p) = \^ safe [x : s) then [x:s):p else [x]:s:p 

In words, a partition with a cheaper cost is obtained by gluing rather than starting a 
new segment. The context-sensitive greedy condition holds if 

cost Pi ^ cost p 2 cost [add xpi) ^ cost [add x P 2 ) 

for any two partitions pi and p 2 of the same list, all of whose segments are safe. 

To see whether or not the greedy condition holds, consider the four possible 
values of qi = add xp\ and q 2 = add xp 2 , namely 


qi = cons X Pi 

q2 = cons X p2 

(12.1) 

qi = cons X Pi 

q2 = glue X p2 

(12.2) 

qi = glue XPi 

q2 = cons X p2 

(12.3) 

qi = glue XPi 

q2 = glue X p2 

(12.4) 


Firstly, suppose \pi | < |p 2 |? where, for brevity, \p\ abbreviates lengthp. Then \qi \ < 
\q 2 \ except for case (12.2). But in this case we have |^i| ^ 1 ^ 72 ! and \head q\ \ < 
\head q 2 \, and therefore cost q\ ^ cost q 2 for all values of q\ and q 2 . 

Secondly, suppose \pi | = |p 2 | and l^i | ^ I 52 I, where = headp\ and S 2 = headp 2 . 
By the assumption that pi and p 2 are partitions into safe segments of the same list, 
it follows that is a prefix of S 2 . Here case (12.2) cannot arise. In the remaining 
three cases it is easy to check that cost q\ ^ cost q 2 . So the greedy condition does 
indeed hold. 

That means the following greedy algorithm solves the bank accounts problem: 

msp :: [Int] Partition Int 
msp =foldr add [] 

where addx [] = 

add X (5 :p) = if safe [x : s) then [x:s)'.p else [x\\s'.p 
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Ignoring the cost of computing safe, this is a linear-time algorithm. Computation of 
safe can be made to take constant time by tupling and is left as an exercise. 

The lesson to be learned from the bank accounts problem is that it is as well to 
check whether a greedy algorithm is possible for a problem before embarking on an 
attack by thinning. But, out of interest, suppose we had gone ahead with a thinning 
strategy anyway. Then we would have 

msp t— MinWith length ■ ThinBy (^) • safeParts 
where =<; has to be chosen so that 

Pi ^ Pi length Pi ^ length p 2 

A sensible choice of ^ is the partial preorder 

Pi ^ Pi = length Pi ^ length p 2 A length {head pi) ^ length {head P 2 ) 
With this choice one can establish the fusion condition 

ThinBy (^) • step x —)■ ThinBy (^) • step x ■ ThinBy (^) 
where step x = concatMap {safeExtendl x). Hence 

msp = minWith length -foldr tstep [ [ ] ] 

where tstep x = thinBy (^) • concatMap {safeExtendlx) 

This algorithm thins at each step. Moreover, one can prove by induction that, with 
the definition of thinBy given in Chapter 8, at most two partitions are kept at each 
stage. Therefore a thinning algorithm based on ^ will be almost as efficient as the 
greedy one. 

There is another point of interest. Both the greedy algorithm and the thinning 
algorithm may return a schedule in which transfers occur before they seem necessary. 
For example, with C = 100 we obtain 

msp [50,20,30,-10,40, -90, -20,60,70, -40,80] 

= [[50], [20,30,-10,40,-90], [-20,60], [70], [-40,80]] 

whereas the alternative solution 

[[50,20,30,-10], [40, -90], [-20,60], [70, -40], [80]] 

also has length five and might seem less suspicious to any tracking software em¬ 
ployed by Zakia’s bank that might reasonably expect transfers to occur only when 
necessary. Exercise 12.12 asks for a solution to this problem. 


12.3 The paragraph problem 

The paragraph problem is the problem of splitting a text into lines in the best 
possible way. To begin with, we introduce the following type synonyms: 
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type Text = [Worr/] 
type Word = [Char] 
type Para = [Line] 
type Line =[Word] 

It is assumed that a text consists of a nonempty sequence of words, each word being 
a nonempty sequence of non-space characters. A paragraph therefore consists of at 
least one line. 

The major constraint on paragraphs is that all lines have to fit into a specified 
width. For simplicity, we assume a single globally defined value maxWidth that 
gives the maximum width a line can possess. A reasonable generalisation, which 
we will not pursue, is to allow different lines to have different maximum widths. 
For example, paragraphs in newspapers often are arranged with varying widths to 
fit alongside pictures with varying contours. Instead we specify 
para :: Text —)■ Para 

para ■(— MinWith cost -filter (allfits) ■ parts 
The functionyiti determines whether a line will fit into the required width: 

fits "Line —)■ Bool 

fits line = width line ^ maxWidth 

width ■.■.Line —)■ Nat 

width =foldrn add length where add w n = length w+\+n 
The function/oZdrn, a general fold over nonempty lists, was defined in Chapter 8. 
The width of a line consisting of a single word is the length of the word, while the 
width of a line consisting of at least two words is the sum of the lengths of the words 
plus the number of inter-word spaces. This definition is appropriate when every 
character, including the space character, has the same width, but it can be adapted 
to fonts in which characters have different widths. It is assumed that no single word 
exceeds the maximum line width, so para is well-defined for every input. 

It remains to define cost and to choose a definition of parts either in terms oifoldr 
or in terms offoldl. The predicate yiti is both prefix-closed and suffix-closed, so it 
seems like a free choice. However, if we use foldr, then we can arrive at solutions 
that, like Zakia’s bank accounts problem, allow short first lines in order to ensure 
longer subsequent lines. The appearance of such a paragraph might appear strange, 
so we will use foldl instead. 

That means we can fuse the filtering with the generation of partitions to arrive at 
para ^ MinWith cost -fitParts 
where 

fitParts = foldl (flip (concatMap - fitExtend)) [[]] 

where fitExtend x = filter (fits ■ last) ■ extendrx 
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Only those partitions whose lines fit into the maximum width are generated at each 
step. 

Finally, how should we define the cost of a paragraph? There are at least five 
reasonable answers. Firsfly, we could define 

cost\ = length 

Here a besf possible paragraph is one wifh fhe fewesl lines. We could also define 

cost 2 = sum ■ map waste ■ init 

where waste line = maxWidth — width line 

Here fhe cosf of a paragraph is fhe sum of fhe wasfe of each line, faken over all lines 
excepf fhe very lasf (where wasfed space does nol defracf from fhe appearance). A 
fhird definition sums fhe squares of fhe wasted space: 

costj, = sum ■ map waste ■ init 

where waste line = {optWidth — width line)^ 

The definition depends on anofher globally defined consfanf optWidth, whose value 
is at most maxWidth and which specifies the optimum width of each line of a 
paragraph. With this version, which is similar to the one used in TpX, lines that 
deviate only a little from the optimum width are penalised less heavily than with 
cost 2 . Finally, two more definitions of cost are 

cost 4 =foldr max 0 • map waste ■ init 

where waste line = maxWidth — width line 

costs =foldr max 0 • map waste ■ init 

where waste line = {optWidth — width line)^ 

Here it is the maximum waste that is minimised. Use of foldr max 0 rather than 
maximum is needed to ensure that the cost of a paragraph consisting of a single 
line is zero. The last four definitions of cost assume that a paragraph is a nonempty 
sequence of lines {init is undefined on an empty list), but we can also set the cost of 
an empty paragraph to zero. 

There is an obvious greedy algorithm for the paragraph problem: 

greedy =foldl add [ ] 

where add [ ] w = snoc w [ ] 

add pw = head {filter {fits ■ last) [bind w p,snoc w p]) 

The algorithm works by adding each word to the end of the last line of the current 
paragraph until no more words will fit, in which case a new line is started. A more 
efficient version is discussed in the exercises. This algorithm is essentially the one 
used by Microsoft Word and many other word processors. 

So, for which definition of cost does the greedy algorithm work? The answer is 
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that cost has to satisfy two properties. Firstly, provided the result fits, adding a new 
word to the end of a line is never worse than starting a new line: 

fits {last {bind w p)) ^ cost {bind wp) ^ cost {snoc w p) 

Secondly, as should be familiar by now, the greedy condition 
cost Pi ^ cost p 2 ^ cost {add p\ w) ^ cost {add p 2 w) 

should hold. The greedy condition does not hold when the cost of a paragraph is 
simply the number of lines (see the exercises), but it does if we strengthen this 
measure by redefining costi fo read 

costi p = {lengthp, width {lastp)) 

That is to say, a best paragraph is one that minimises the number of lines and, 
among such paragraphs, one that has a shortest last line. The proof is similar to the 
one in the bank accounts problem. As in the previous proof, let = add p\ w and 
q 2 = addp 2 w. There are four possible cases: 


qi = bind wpi q 2 = bind w p 2 (12.5) 

qi=bindwpi q 2 =snocwp 2 (12.6) 

qi = snoc wpi q 2 = bind w p 2 (12.7) 

qi=snocwpi q 2 =snocwp 2 (12.8) 


Suppose costi Pi ^ costi p 2 . Firstly, if \pi \ < \p 2 \, where again \p\ abbreviates 
lengthp, then |^i| < \q 2 \ except in case (12.7). But in case (12.7) we have 

l^il ^ 1 ^ 2 ! A width {last qi) <width {last q 2 ) 
which implies costi q\ < costi q 2 - Secondly, suppose 
|fi| = IF 2 | a width {lastP i) ^ width {lastP 2 ) 

Here, case (12.7) cannot arise. In cases (12.5) and (12.8) we have 
l^il = 1 ^ 2 ! A width {last qi) = width {last q 2 ) 

while in case (12.6) we have \qi\ < \q 2 \. So costi qi ^ costi q 2 in all cases. The 
greedy algorithm therefore minimises the number of lines in a paragraph. 

The greedy algorithm also works for cost 2 , the cost function that sums the waste 
of each line except the last. We claim that 

costi Pi ^ costi P 2 ^ COSt 2 Pi ^ COSt 2 P2 

For the proof, suppose pi consists of the lines [li.ifiig., ...fii^k], with wij as the 
width of liq. Then, abbreviating maxWidth to M, we have 

cost 2 Pi = (M-wi,i) + (M-wi, 2 ) H-h (M-wi^jt_i) 

= {k-\)M-{T-{wi,k + k-\)) 

where T is the total width of the text. Thus (T — (wi^i + k — 1)) is the sum of the 
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widths of all lines except the last because k—\ inter-word spaces are replaced hy 
newlines. Similarly, if p2 consists of the lines [^ 2 , 1 ,^ 2 , 2 , then 

COSt2 P2 = ipi—l) M — {T — {w2,m+fn— 1)) 

Suppose Pi ^cost\p 2 , so {k,w\^k) ^ (m,H' 2 ,m)-If ^ then 
COSt2P2 ^ COSt2P\ +M + W2,m-Wi^k>COSt2Pl 
because w\^k<M. If, on the other hand, k = m and ^ W 2 ^m, then 
COSt2 P2 = COSt2 Pi + W2,m — Wi^k ^ COSt2 Pi 

In either case we have cost 2 pi ^ cost 2 P 2 - 

However, the greedy algorithm does not work for the other definitions of cost 
described above. Take maxWidth = 10 and optWidth = 8 and consider the two 
partitions 

Pi = [[h'6,Wi],[w5,W3],[w4],[w7]] 

P2 = [[W6],[W1,W5],[W3,W4],[W7]] 

in which length wi = i for each word w,. The partition pi is the one returned by the 
greedy algorithm. We have 

costj, Pi = sum [(8 — 8 )^, (8 — 9)^, (8 — 4)^] = 17 
costj, P 2 = sum [(8 — 6 )^, (8 — 7)^, (8 — 8 )^] = 5 

cost 4 Pi = maximum [10 — 8 ,10 — 9,10 — 4] =6 
costn p 2 = maximum [10 — 6,10 — 7,10 — 8 ] =4 

co5f5Pi = max/mum [8 — 8,8 — 9,8 — 4] =4 

co5f5p 2 = max/mu/n [8 — 6,8 — 7 ,8 — 8 ] =2 

With all these measures of cost, p 2 is a better partition than pi, so the greedy algo¬ 
rithm does not lead to the best solution. That means we need a thinning algorithm 
for these particular cost functions. 

More generally, we will describe a thinning algorithm for any admissible cost 
function, meaning that if 

cost Pi ^ cost p 2 A width {last pi) = width {last P 2 ) 

then 

cost {bind wpi) ^ cost {bind wP 2 ) A cost {snoc wpi) ^ cost {snoc w P 2 ) 

As can easily be checked, all the cost functions introduced above are admissible 
cost functions. 

Suppose Pi and p 2 satisfy these two conditions. Then for any completion 

q 2 = initp 2 [lastp 2 -H- [loj] df [h]+\ -H- [4] 

of p 2 to a full paragraph, there is a similar completion 
qi = initpi -H- [Za^tpi -H- [Zq]] -H- [Zi] -H-H- [Z^,] 
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ofp\. Moreover, cost qi ^ cost q 2 - Hence the partial paragraph p 2 can never lead to 
a better solution than pi and can be eliminated from the computation. Note carefully 
that this conclusion depends on the last lines of pi and p 2 having equal widths: if 
the last line of pi had width smaller than that of p 2 , then every valid completion of 
P 2 remains a valid completion of pi, but the cost of the latter may not be smaller 
than the cost of the former. 

Taken together, all of this means thinning with is appropriate, where 

Pi ^P 2 = cost Pi ^ cost p 2 A width (last pi) == width {last P 2 ) 

However, instead of using thinWith (^) we can customise the thinning step by 
keeping the list of partitions ps in increasing order of width of last line. Then the 
partitions in map {bind w) ps are also in this order. Moreover, the partitions in 
map {snoc w) ps all have the same last line, the shortest one possible. Thinning this 
list means retaining only the single partition 

minWith cost {map {snoc w) ps) 

when beginning a new line. Therefore thinning can be implemented by the following 
definition: 

para = minWith cost -foldl tstep [ [ ] ] 
where tstep [ [ ] ] w = [ [ [w] ] ] 

tstep psw = minWith cost {map {snoc w) ps ): 

filter {fits ■ last) {map {bind w) ps) 

It is easy to see that at most M = maxWidth partitions are kept in play at each step, 
since no last line can have width more than M. We will leave it to the exercises 
to show how to memoise cost and width, and how to implement snoc and bind 
efficiently, so that tstep takes 0{M) steps. Hence the paragraph problem for n words 
takes 0{Mn) steps. It is possible with a more sophisticated algorithm to eliminate 
the dependence of this bound on M for certain definitions of cost, but we will not 
go into the details. 


12.4 Chapter notes 

The problem of managing two bank accounts is an updated reworking of the security 
van problem invented by Hans Zantema and discussed in Section 7.5 of [2]. There 
are many articles on the paragraph problem, including two, [1] and [3], written by 
ourselves. In [3] it is shown how to remove the dependence on the maximum line 
width in the running time for some definitions of cost. For a thorough discussion of 
the line-breaking algorithm used in TgX see [4]. 
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Exercises 

Exercise 12.1 How many partitions of a list of length n > 0 are there? 

Exercise 12.2 Why is the clause parts [ ] = [ [ ] ] necessary in the first definition of 
parts! 

Exercise 12.3 Give another definition of parts in terms oifoldr, one that at each 
step does all the cons operations before the glue operations. 

Exercise 12.4 Give the details of the proof that 

filter [all ok) - parts =foldr {concatMap ■ okextendl) [ [ ] ] 

provided ok is suffix-closed. (Hint: it is probably best to express the fusion condition 
in terms of list comprehensions.) 

Exercise 12.5 Which of the following predicates on nonempty sequences of posi¬ 
tive numbers are prefix-closed and which are suffix-closed? 

leftmin xs = all {head 
rightmaxxs = all {^ last xs) xs 
ordered xs = and {zipWith (^) xs {tail x^)) 
nomatch xs = and {zipWith (/) xs [0..]) 

Do each of fhese predicafes hold for singlefon lisfs? 

Exercise 12.6 Suppose fhaf n ^0 ^m. Show fhaf 

{3r : 0 ^ r + n ^ C AO ^ r + m ^ C) 44 m^C + n 

Exercise 12.7 Show that the predicate safe in the bank accounts problem is both 
prefix-closed and suffix-closed. 
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Exercise 12.8 Suppose C = 10. What is the value of msp [2,4,50,3] when msp is 
the greedy algorithm for the hank accounts problem and when msp is defined hy the 
original specification? 

Exercise 12.9 The function add in the bank accounts problem does not take con¬ 
stant time because the safety test can take linear time. But we can represent a 
partition p by a triple 

(p,minimum (sums {headp)),maximum {sums {headp))) 

where sums = scant (+) 0. Write down a new definition of msp that does take linear 
time. 

Exercise 12.10 The function msp returns a partition, not the transfers that have to 
be made to keep the current account in balance. Show how to define 

transfers:: Partition Int —)■ [Int] 

by compufing a pair {n, r) of nonnegafive numbers for each segmenf, where n is fhe 
minimum fhaf has fo be in fhe currenf accounf fo ensure fhe segmenf is safe and r is 
fhe residue after the transactions in the segment. 

Exercise 12.11 Consider the thinning algorithm for the bank accounts problem. 
Suppose that at some point in the computation there are two partitions of the form 
[y]\ys\p and (y: ys ): p. This could happen as early as the second step, producing the 
partitions [[y], [z]] and [[y,z]]. Show that adding in a new elementx and thinning 
the result will produce either a single partition, or two partitions of the above form. 

Exercise 12.12 How can Zakia address the suspicious feature of the given solution 
to the bank accounts problem, namely that transfers can occur before they are 
absolutely necessary? 

Exercise 12.13 The function runs used in Mergesort is specified by 

runs:: Ord a ^ [a] —)■ Partition a 

runs t— MinWith length-filter {all ordered) ■ parts 

Without looking back to the section on Mergesort, write down a greedy algorithm 
for computing runs. Why does the greedy algorithm work? 

Exercise 12.14 Show that the greedy condition fails when the cost of a paragraph 
is simply the number of lines. 

Exercise 12.15 The greedy algorithm for the paragraph problem can be made more 
efficient in two steps. This exercise deals with the first step and the following 
exercise with the second step. Consider the function help specified by 
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p -H- help I ws =foldl add (p^[l]) ws 
Prove that 

greedy {w : w^) = help [w] ws 
v/htrt help I [] = [Z] 

help I {w: wi) = if width I' ^ maxWidth 

then help I' ws else I : help [w] W5 
where Z' = Z-H- [w] 

Exercise 12.16 For the seeond step, memoise width and eliminate the eoncatenation 
with the help of an aeeumulating funetion parameter. 

Exercise 12.17 In the thinning version of the paragraph problem, ean we replace 
filter by takeWhile? 

Exercise 12.18 Show that the cost functions described in the text for the paragraph 
problem are all admissible. 

Exercise 12.19 With some admissible cost functions, the thinning algorithm may 
select a paragraph with minimum cost but whose length is not as short as possible. 
How can this deficiency be overcome? 

Exercise 12.20 The refinement 

snoc w ■ MinWith cost ■(— MinWith cost ■ map {snoc w) 
follows from the condition 

cost Pi ^ cost p 2 ^ cost {snoc wpi) ^ cost {snoc w P 2 ) 

Does this condition hold for costal 

Exercise 12.21 Suppose we had gone for a right-to-left thinning algorithm for the 
paragraph problem, using a definition of parts based on foldr. This time a cost 
function is admissible if 

cost {glue wpi) ^ cost {glue wP 2 ) A cost {cons wpi) ^ cost {cons wP 2 ) 
provided that 

cost Pi ^ cost p 2 A width {head pi) = width {head P 2 ) 

As can be checked, all five cosf functions introduced in the text are admissible in 
this sense. Write down the associated thinning algorithm. Give an example to show 
that the two different thinning algorithms produce different results for costj,- 
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Exercise 12.22 The final exercise is to make the thinning algorithm for the para¬ 
graph problem more efficient. Setting rmr = reverse ■ map reverse, we can represent 
a paragraph p hy a triple 

{rmrp,costp,width {lastp)) 

The last two components memoise cost and width, while the first component means 
that snoc and bind can he implemented in terms of cons and glue. More precisely, 
we have 

snoc w ■ rmr = rmr ■ cons w 
bind w ■ rmr = rmr ■ glue w 

Write down the resulting algorithm, assuming the cost function is cost^. 


Answers 

Answer 12.1 There are 2”^^ partitions. 

Answer 12.2 Because with the single clause 

parts xs = [y5:y55 | (y5,z5) t— splits xs,yss t— parts zs] 
we would have parts [ ] = [ ], from which it follows that parts = [ ] for all xs. 
Answer 12.3 The definition is 

parts =foldr step [ [ ] ] 

where step .^ [[]] = [[ [.x] ] ] 

step xps = map {cons x) ps -H- map {glue x) ps 

Answer 12.4 In terms of list comprehensions, the fusion condition takes the form 

[p' I p t— ps,p' t— extendi xp, all okp'] 

= [p' I p t— ps,all ok p,p' t— extendi xp, ok {head p')] 

for all partitions ps of the same list. With the given definition of extendi, the fusion 
condition follows if we can show that 

[cons xp I p t— ps,all ok {consxp)] 

= [cons xp \ p ^ ps,all ok p /\ok {head {cons xp))] 

[gluex {s'.p)\s'.p ^ ps,all ok {gluex (^:p))] 

= [glue X {s:p) [ s:p ps,all ok {s'.p) A ok {head {gluex :p)))] 

Since cons xp = [x]:p and 

all ok ([x]:p) = all okp A ok [x] 

the first condition holds. Since glue x {s:p) = {x:s):p and, provided ok is suffix- 
closed, we have 
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all ok {{x:s):p) = all okp A ok {x: s) 

= all ok p A ok s A ok {x: s) 

= all ok {s:p) A ok {x: s) 
the second condition holds. 

Answer 12.5 The predicates leftmin and nomatch are prefix-closed but not suffix- 
closed, while rightmax is suffix-closed but not prefix-closed. Finally, ordered is 
both prefix-closed and suffix-closed. All predicates hold for singletons (in the case 
of nomatch no positive integer is 0 ). 

Answer 12.6 We can reason 

(3r : 0 ^r-|-n^CA 0 ^r-|-m^C) 

{ arithmetic } 

{3r : —n ^ r ^ C — n A —m ^ r ^ C — m) 

AA { arithmetic } 

(3r : max {—n) {—m) ^ r ^ min (C — n) (C — m)) 

44 { assuming n ^ 0 ^ m } 

(3r : —n ^r ^ C — m) 

44 { logic } 

m^ C + n 

Answer 12.7 If all the sums r,r+xi,r+xi +X 2 , ...,r+xi +X 2 -\ - \-Xk lie between 

0 and C, then certainly every prefix of these sums does too. Taking r' = r-|-xi, we 

have that all the sums r',/ +X 2 ,---,r' +X 2 -\ - \-Xk also lie between 0 and C, so 

safe is suffix-closed as well as prefix-closed. 

Answer 12.8 For the greedy algorithm we have 
[2,4,50,3] = [[2,4], [50], [3]] 

but for the original specification the answer is the undefined value. Since the segment 
[50] is not safe, there is no partition into safe segments. 

Answer 12.9 We have 

msp = part -foldr add ([ ], 0 , 0 ) 

where 

part {p,n,m) = p 

add X pnm \ null (part pnm) = cons x pnm 
I safe (glue x pnm) = glue x pnm 
I otherwise = cons x pnm 

cons X (p,n,m) = ([x] :p,min 0 x,max 0 x) 

gluex (s :p,n,m) = ((x:s):p,min 0 (x + n),max 0 (x 3-m)) 
safe (p,n,m) = max 0 (—n) ^ min c (c — m) 
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Answer 12.10 The values {n, r) for the segments in a partition can be computed by 
the function endpoints, where 

endpoints'.: \Int] —)■ {Int,Int) 
endpointsxs = \^n <0 then {—n,x — n) else ( 0 ,x) 
where n = minimum sums 
X = last sums 
sums = scanl (+) 0 xs 

For example, 

map endpoints [[40,-85,55], [—32,79], [80], [—21,80]] 

= [(45,55), (32,79), (0,80), (21,80)] 

The current account has to have a balance of 45 to ensure the first segment is safe. 
At the end of the segment we can transfer 55 — 32 to the savings account to ensure a 
credit of 32 for the next segment; and so on. Hence we can define 

transfers = collect ■ map endpoints 
collect'.: [{Int,Int)\ —)■ [Int] 

collect xys = zipWith (—) (map fst xys ^ [0]) {[0]^map sndxys) 

For example, 

collect [(45,55), (32,79), (0,80), (21,80)] = [45, -23, -79, -59, -80] 

Assuming a zero balance af fhe beginning, 45 has to be fransferred to fhe currenf 
accounf fo ensure fhe firsl segmenf is safe; fhe remaining amounfs are whaf can be 
fransferred fo fhe deposif accounf af fhe end of each segmenf, leaving a zero balance 
in fhe currenf accounf af fhe end of all fhe fransacfions. 

Answer 12,11 After adding x, fhere are fhree possible lisfs of partitions fhaf can 
resulf: 

[[x]: [y] :ys:p, [x,y] :ys:p, [x]: (y.ys) :p, {x:y:ys) :p] 

[[x]: [y] :ys:p, [x,y] :ys:p, [x]: {y.ys) :p] 

[[x]: [y] :ys:p,[x]: {y.ys) :p] 

Furfhermore, 

[x]:{y:ys):p 4 [x]:[y]:ys:p 
[x]:{y:ys):p ^ [x,y]:ys:p 

Hence, afler fhinning by thinBy fhe following partitions are lefl in each case: 

[[x]: {y.ys) :p,{x:y:ys) :p] 

[[x]:{y:ys):p] 

[[x]:{y:ys):p] 
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The first pair of partitions also has the same form as in the question, so at most two 
partitions are generated at eaeh step. 

Answer 12.12 The obvious answer is for Zakia to use a greedy algorithm that 
proeesses from left to right: 

msp =foldl add [] 
where add []x= [[x]] 

addpx = head {filter {safe ■ last) [bindxp,snoc xp]) 

The answer to Exercise 12.15 shows how to make this version efficient. The validity 
of the left-to-right algorithm depends on the fact that safe is prefix-closed. 

Answer 12.13 The definition is 

runs:: Ord a ^ [a] —)■ Partition a 
runs =foldr add [ ] 

where add X [] = 

add X {s:p) =i{ ordered {x : s) then {x:s):p else [x]:s:p 

The greedy algorithm works because exactly the same reasoning as in the hank 
accounts problem applies, with safe replaced by ordered. Furthermore, the test in 
the definition of add can be simplified to x ^ head s. 

Answer 12.14 Take maxWidth = 10 and consider the two paragraphs 

pi = [[6,l],[5,3],[4]] 

P2 = [[6],[1,5],[3,4]] 

both of which have the same length. We have 

add 4 PI = [[6,1],[5,3],[4,4]] 

add4p2 = [[6],[\,5],[3,4],[4]] 

so the greedy condition fails. 

Answer 12.15 We have 

foldladd (p-H- [/]) [] =p4f [t] 

so help t [] = [t]. Next, if add pw = bind w p, then we have 

foldladd (p-H- [/]) {w.ws) = foldladd (p4f [t-H- [w]]) ws 

which shows that help I {w : ws) = help (t -H- [w] ) ws. Finally, if add pw = snoc w p, 
then 

foldladd (p-H- [/]) {w.ws) = foldladd (p+P [t] -H- [[w]]) ws 
which shows that help I {w: H'i') = I: help [w] ws. 
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Answer 12.16 The result is 

greedy (w: w^) = help {{w:), length w) ws 

where 

help(f,di)[\ =[/■[]] 

help (f,di) {w.ws) 

I d 2 ^ maxWidth = help (f ■ {w:),d 2 ) ws 
{otherwise =f[]:help{{w:),d)ws 

where d 2 = di + \+d',d = length w 

Answer 12,17 Yes, the paragraphs are in increasing width of last line, so testing 
can be abandoned as soon as a last line does not fit. 

Answer 12.18 The first inequality holds because cost {bind w p) = cost p. For the 
second inequality we have 

cost {snoc w p) = cost p 0 waste {last p) 

where 0 is either 0 or max. The result follows because 0 is monotonic and the 
waste of a line depends only on its width. 

Answer 12,19 Define a new cosf funcfion cost'p = {costp, lengthp). The funcfion 
cost' is admissible if cost is. 

Answer 12.20 No. Take fhe two paragraphs 

Pi = [[6,1],[5,3],[4]] 

P2 = [[6],[1,5],[3,4]] 

whose costs, assuming maxWidth = 10 and optWidth = 8, are 1 and 5 respectively. 
We have 

costs {snocApi) = costs [[6,1], [5,3], [4], [4]] = 17 
costs {snocAps) = costs [[6], [1,5], [3,4], [4]] =5 

Answer 12.21 The thinning algorithm is 

para = minWith cost -foldr tstep [ [ ] ] 
where tstep w [[]] = [[ [w] ] ] 

tstep w ps = cons w {minWith costps ): 

filter (fits ■ head) {map {glue w) ps) 

Take maxWidth = optWidth = 16. Flere is just one example that shows different 
outputs: 
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Here is just 
one example that 
shows different 
outputs: 


Here is just one 
example that 
shows different 
outputs: 


The paragraph on the left was produced by the right-to-left algorithm, while the one 
on the right was produced by the left-to-right one. The widths of the first layout are 
[12,16,15,8] while those of the second are [16,12,15,8] so the costs are the same. 


Answer 12.22 The algorithm is 


para = thePara ■ minWith cost ■ thinparts 

where 

thePara (p, _, _) = reverse {map reversep) 


cost (_,c,_) 
ok {-,-,k) 
thinparts {w : wi) 
start w 
step ps w 

snoc w {p,c,k) 
bind w {p,c,k) 


= c 

= k ^ maxWidth 
=foldl step {start w) ws 
= [{[[w]],Q,length w)] 

= minWith cost {map {snoc w) ps): 

takeWhile ok {map {bind w) ps) 

= {cons w p,c + {optWidth — k)'^, length w) 
= {glue w p,c,k+l + length w) 
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The term Dynamic Programming was coined by Richard Bellman in 1950 to de¬ 
scribe his research into multi-stage decision processes. The word programming was 
chosen as a synonym for planning to mean the process of determining the sequence 
of decisions that have to be made, while dynamic suggested the evolution of the 
system over time. These days, dynamic programming as a technique of algorithm 
design means something much more specific. It involves a two-stage process in 
which a problem, usually but not necessarily an optimisation problem, is formulated 
in recursive terms and then some efficient way of computing the solution is found. 
Unlike a divide-and-conquer problem, the subproblems generated by the recursive 
solution can overlap, so naive execution of the recursive algorithm will involve 
solving the same subproblem many times over, possibly an exponential number of 
times. 

One way to understand the problem of overlap is to look at the dependency 
graph associated with a recursive function. This is a directed graph whose vertices 
represent function calls and whose directed edges show the dependency of each call 
on recursive calls. While the dependency graph of a divide-and-conquer algorithm 
is a tree of some kind with no shared vertices, the graph of a dynamic programming 
algorithm is an acyclic directed graph, possibly with many shared vertices. A vertex 
is shared if there is more than one incoming edge to the vertex. 

The first job in solving an optimisation problem by dynamic programming is 
simply to obtain a recursive solution. As with thinning algorithms, the key step 
is to exploit a suitable monotonicity condition. This condition enables an optimal 
solution to a problem to be expressed in terms of optimal sub-solutions. When the 
shape of the recursion is inductive, a thinning algorithm is appropriate; when it is 
not, the techniques of dynamic programming come into play. 

Having obtained a recursive description of the solution, there are basically two 
ways to ensure that sub-solutions are not computed more than once. One is called 
memoisation. Here the recursive, top-down structure of the computation is preserved 
but sub-solutions are remembered and stored in a table for subsequent retrieval. 
Thus at each recursive call one first checks to see whether the call has been made 
before, in which case the solution is retrieved from the table; otherwise the solution 
is computed recursively and the result is stored. 

The second method, and the one we will focus on, is called tabulation. Here, 
the computation switches to a bottom-up scheme in which, by careful planning (or 
‘programming’), the simplest partial results are computed first, and then solutions to 
larger subproblems are computed in an appropriate order until the complete solution 
is obtained. For some problems, installing a tabulation can be viewed as the problem 
of finding a shortest path in a suitable layered network derived from the dependency 
graph. We considered the layered network problem in Chapter 10, and we will look 
at it again in the following chapter. 
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Each approach, the top-down and bottom-up methods, has its advantages and dis¬ 
advantages. Memoisation is in principle easy to install but does require a systematic 
way of coding the arguments of the recursive function so that they can be used as 
indices in a table, usually an array of some kind. These arguments also have to be 
testable for equality. A top-down approach ensures that only those values actually 
needed for the full computation are computed. Tabulation requires a more wholesale 
change to the structure of the solution but, if the tabulation scheme is chosen well, 
each solution can be determined easily from the solutions to the associated subprob¬ 
lems. On the other hand, some simple tabulation schemes may involve computing 
the solutions to subproblems not actually required for the full solution. 

The aim of the next two chapters is to look at a number of problems for which 
dynamic programming is a viable technique, and to examine the various kinds 
of tabulation scheme that can arise. In imperative programming most tabulation 
schemes involve arrays of various kinds, but in functional programming other 
representations can prove superior. 



Chapter 13 


Efficient recursions 


In this chapter we introduce the essential ideas of dynamic programming by looking 
at the recursive formulation of some simple problems, examining the dependency 
graph associated with each recursion, and finding a suitable tabulation scheme 
for implementing the recursion efficiently. Most problems for which dynamic 
programming is appropriate are optimisation problems of one kind or another, 
but the first two problems, the Fibonacci function and the problem of computing 
binomial coefficients, are not. We will also give dynamic programming solutions 
for the knapsack problem of Chapter 10 and the longest common subsequence 
problem of Chapter 11. Two additional problems, the minimum-edit problem and 
the shuttle-bus problem, are also described. All these examples illustrate the range 
of possibilities for different tabulation schemes. 


13.1 Two numeric examples 

Perhaps the simplest example of a recursion that involves the same calculation being 
repeated many times over is the Fibonacci function: 

fib :: Nat —)■ Integer 

fib n = if n ^ 1 then fro mintegral n eisefib (n — 1) -\-fib (n — 2) 

We use Integer arithmetic for the result since values of fib grow large very quickly. 
Direct evaluation of fib on an argument n>\ involves fib k evaluations of fib on the 
argument n — k for \ ^k<n , so direct evaluation takes an exponential number of 
steps (see Exercise 13.1). 

The dependency graph of the computation of fib for n = 7 is pictured in Fig¬ 
ure 13.1. This is a directed acyclic graph with a single root, labelled with 7, and 
directed edges from a node to the two recursive calls associated with the node. 

One way of making the computation more efficient is to use a one-dimensional 
array: 
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7 6 5 4 3 2 1 0 

Figure 13.1 The dependency graph of fib 7 


fib "Nat —)■ Integer 
fib n = a\n 

where a = tabulate/ (0, n) 

fi = iU ^ 1 then fromintegral i else ! (/ — 1) + a; ! (/ — 2) 

The function tabulate is defined by 

tabulate:: Ix i ^ (/ —)■ e) —)■ (/, i) —)■ Array i e 

tabulate/ bounds = array bounds \ {xfi x) \ x ■(— range bounds] 

The declaration a = tabulate/ (0,n) in the definition of fib builds an array a whose 
/th entry for / > 1 is the unevaluated expression a ! (/ — 1) + a ! (/ — 2). Thus tabulate 
takes linear time. Array entries are evaluated only when required, and then they are 
evaluated at most once. Therefore the above definition of fib takes linear time. We 
will use tabulate again when tabulating with arrays. 

However, using an array for the tabulation of fib is overkill because at each step of 
the computation only the two previous values of fib are required. The table therefore 
need consist of only two entries. This observation leads to the following simple 
definition: 

fib '.'.Nat —)■ Integer 

fib n =/st {apply n step (0,1)) 

-where step {a,b) = {b,a + b) 

The ‘table’ consists of a pair of values. It is easy to show by induction that 
apply n step (0,1) = (fib n,fib (n + 1)) 

so the above program is correct. This solution also takes linear time. In fact there is 
even a logarithmic-time algorithm for computingyifi, see Exercise 13.3. 

The second example concerns computing binomial coefficients. The standard 
definition is, of course, 

/ n\ n\ 

\r J r\{n — r)\ 

and can be easily implemented by 

binom :: {Nat,Nat) —)■ Integer 

binom {n, r) =/act n div (fact r x/act (n — r)) 

where /act n = product [1. ./romintegral n] 
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(6,3) (5,2) (4,1) (3,0) 
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(5,3) 

(4,2) 

(3,1) 

(4,3) 

(3,2) 

(2,1) 


(3,3) (2,2) (1,1) 


( 2 , 0 ) 

( 1 , 0 ) 


Figure 13.2 Computation of fe/nom (6,3) 


We can also define binomial coefficients recursively. If 0 < r < n, then 



Furthermore, 



That leads to the following recursive definition of binom: 

binom:: {Nat,Nat) —)■ Integer 

binom {n, r) = if r == 0 V r == n then 1 

else binom (n — 1, r) + binom (n — 1, r — 1) 

Like the Fibonacci function, this definition of binom can take exponential time if 
executed directly. 

The dependency graph for binom on the argument (6,3) is pictured in Figure 13.2. 
It takes the form of a two-dimensional grid, so a simple tabulation scheme can be 
based on a two-dimensional array: 

binom V. {Nat,Nat) —s- Integer 
binom {n,r) = a\{n,r) 

where a = tabulate/ ((0,0), (n,r)) 

/ {i,j) = if j == 0 V / ==j then 1 else a \ {i — \,j) + a \ {i — l,j — 1) 

The function tabulate was defined above. Flowever, half of the entries, namely 
those for {i,]) where i < j, consist of the undefined value _L, so the array program is 
wasteful of space. 

A better solution can be based on a single list. Observe that the values of binom 
for the grid in Figure 13.2 are given by 
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20 10 4 1 

10 6 3 1 

4 3 2 1 

1111 

and that each row consists of the running sums, reading from right to left, of the 
elements in the row helow it. That means we can define 

binom {n,r) = head {apply (n — r) {scanrl (+)) {replicate (r+1) 1)) 

The function scanrl, a variant of scam and defined only for nonempty lists, is 
another Standard Prelude function whose values are illustrated hy 

scanrl ( 0 ) [xi,X2,X3] = [xi 0 (x2 0X3),X20X3,X3] 

This method takes r{n — r) additions to compute (”), hut no multiplications. 


13.2 Knapsack revisited 


For our next example of dynamic programming, let us take a second look at the 
knapsack problem from Section 10.4. Recall the following declarations: 


type Name 
type Value 
type Weight 
type Item 
type Selection 


= String 
= Nat 
= Nat 

= {Name, Value, Weight) 

= {[Name], Value, Weight) 


name =n 

value {^,v, ) =v 
weight , w) =w 


In Section 10.4 we specified swag by 

swag:: Weight —)■ [Item] —)■ Selection 
swag w t— MaxWith value ■ choices w 


where choices was defined by afoldr. This time we define the choices recursively: 

choices:: Weight —)■ [Item] —)■ [Selection] 

choices w [ ] =[([]> 0; 0) ] 

choices w {i: is) = if w < wi then choices w is 

else choices w is TF map {add i) {choices {w — wi) is) 
where wi = weight i 


add:: Item —)■ Selection —)■ Selection 

add i {ns,v,w) = {name i: ns, value i + v, weight i 0 w) 

Each item is considered in turn and, weight permitting, either added to the selection 
or not. 
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(5,0) 

(4,0) 

(3,0) 

(2,0) 

( 1 , 0 ) 

(0,0) 


Figure 13.3 Knapsack with capacity 5, and four items with weights 3, 2, 2, 1 


It is easy to show that the monotonieity condition 

value sni ^ value sn 2 value {add i sn\) ^ value {add i sn 2 ) 
holds. That means 

add i ■ MaxWith value ^ MaxWith value ■ map {add i) 

Using this fact and the distributive law of MaxWith, an easy calculation gives us the 
following recursive version of swag: 

swag :: Weight —)■ [Item] —)■ Selection 

swagw[] =([],0,0) 

swag w {i: is) = if w < wi then swag w is 

else better {swag w is) {add i {swag (w — wi) is)) 
where wi = weight i 

better" Selection —)■ Selection —)■ Selection 

better sni sn 2 = if value sni ^ value sn 2 then sni else sn 2 

In words, if there are no items to choose from, then the result is the empty selection 
with zero weight and zero value. Otherwise the choice is the better of packing the 
next item, assuming the weight of the knapsack allows it, and not packing it. In 
either case, the remaining selection is the best possible for the remaining items and 
the remaining capacity. 
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Suppose the carrying capacity of the knapsack is 5 and there are four items 
to choose from, with weights 3,2,2,1. The dependency graph for this instance is 
pictured in Figure 13.3. A pair {w,r) represents the problem of computing swag 
when the capacity of the knapsack is w and the last r items are left to choose from. 
In this instance there are only two shared values, at (3,1) and (0,1), but in general 
there will be many more. One straightforward tabulation scheme is to use a two- 
dimensional array. A more space-efficient alternative is to reuse a one-dimensional 
array, building the solution column by column from right to left, each column being 
represented by the entries in a single array. Yet a third way is to recast the problem 
as one of computing a path of maximum value in a layered network. Each layer 
is a column in the dependency graph and the edges go from layer to layer. We 
considered the layered network problem in Chapter 10, except that there we were 
looking for a path of minimum cost rather than maximum value. If the capacity of 
the knapsack is w and there are n items, then finding a best path will take 0{nw) 
steps, so the dynamic programming algorithm has the same asymptotic complexity 
as the thinning algorithm of Chapter 10. However, the computational overhead in 
recasting the knapsack problem as a layered network problem is quite large. 

Instead, we will build the solution column by column from right to left, but 
using a list rather than an array. For example, here is part of Figure 13.3 redrawn 
horizontally to show the dependence of each column, now a row, on the one below 
it. The row also shows the dependencies for (4,2) and (1,2), values not required in 
the recursive solution: 


(5,2) (4,2) (3,2) (2,2) (1,2) (0,2) 



Each new entry in a row depends on the previous entry in the same row and, possibly, 
an entry further to the right. All these additional entries are shifted by the same 
amount, namely the weight of the current item being considered. We can therefore 
define 

swagw Weight —)■ [Item] —)■ Selection 
swag w = head -foldr step start 

where start = replicate (w -f 1) ([ ], 0,0) 

step i row = zipWith better row {map {add i) {drop wi row)) 

-H- drop {w + l—wi) row 
where wi = weight i 

This solution is of comparable speed to the thinning algorithm of Section 10.4, and 
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slightly faster than one based on a one-dimensional array, but it does depend on all 
weights being integers, an assumption not needed in the thinning algorithm. 


13.3 Minimum-cost edit sequences 

Our next example of dynamic programming concerns another way of comparing 
the similarity of two strings. One such measure, as we saw in Section 11.2, is the 
length of the longest common subsequence of the two strings. Another measure is 
to count the cost of transforming one string into the other by a sequence of simple 
edit operations. There are various possible edit operations, but we will allow just 
the following four: 

• The operation Replace x y replaces the current character x in the first string xs by 
y and then moves on to the next character of xs. It is supposed that x and y are 
different characters. 

• The operation Copy x has the same effect as Replace x x. 

• The operation Delete x deletes the current character x in xs and moves on to the 
next character. 

• The operation Insert y inserts a new character y before the current character of xs, 
and then moves on to the next character. 

These edit operations are encapsulated in the data type 

data Op = Copy Char \ Replace Char Char \ Delete Char \ Insert Char 

The character being replaced, copied, or deleted from the first string is made explicit 
in the edit operation. In this way, the edit sequence transforming the second string 
into the first can be obtained by intercbanging insert and delete operations and 
swapping the arguments of a replace. One can also recover both the source and the 
target string from the edit sequence alone. More precisely, we can define 

reconstruct" [Op] —)■ {[Char], [Char]) 
reconstruct =foldr step ([],[]) 

where step {Copy x) {us, v^) = {x:us,x: v^) 
step {Replace xy) {us, vs) = {x:us,y:vs) 
step {Insertx) {us,vs) = {us,x'.vs) 
step {Delete x) {us, v^) = (x: us, v^) 

Each edit operation has an associated cost. We suppose that the cost of a replace is 
less than the combined costs of an insert and a delete, for otherwise there would be 
no point in having a replace operation. Furthermore, the cost of a copy operation is 
assumed to be zero; then two identical strings can be transformed into one another 
with zero cost. Here is an example where the cost of an insert or delete is 2 units 
and the cost of a replace is 3 units: 
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i*nstitution* 

constitue*=(=iit 

3200000032202 

The string "institution" is transformed into "constituent" by replacing i 
by c, inserting an o, copying the next six characters, replacing t by e, deleting 
the next two characters, copying n and finally inserting t. The two strings have 
been aligned by using a * character to indicate insertions or deletions. The total 
cost of this sequence of edits is 14 units, which is the smallest possible cost when 
the individual edits are costed as above. The function cost yields the sum of the 
individual edit costs: 

cost:: [Op] —)• Nat 
cost = sum ■ map ecost 


ecost {Copy x) =0 

ecost {Replace xy) =3 

ecost {Delete x) =2 

ecost {Insert y) =2 


The problem of computing mce (a minimum-cost edit) is now specified by 

mce:: [Char] —)■ [Char] -^[Op] 
mce xs ys t— MinWith cost {edits xs ys) 

The function edits returns all possible edit sequences: 

edits:: [Char] —)■ [Char] —)■ [[Op]] 

edits [ ] = [map Delete xs] 

edits [] = [map Insert ys] 

edits {x: xs) {y: ys) = [pick xy :es] es ^ edits xs y^] -H- 

[Delete x:es]es ^ edits xs (y: y^) ] -H- 
[Insert y:es]est— edits {x: xs) y^] 
pick xy = ifx==y then Copy x else Replace x y 

The primary monotonicity condition for this problem is that 
cost esi ^ cost es 2 cost {op :es\) ^ cost {op: es 2 ) 

for all edit operations op, where esi and es 2 are edit sequences in edits xs ys. That 
leads to the recursive formulation 

mce [ ] = map Delete xs 

mce [ ] y5 = map Insert ys 

mce {x: xs) (y: y^) = minWith cost [pick x y: mce xs ys, 

Delete x: mce xs (y: y^), 

Insert y: mce {x: xs) y^] 

However, we can go one step further. Provided it is available, a Copy operation at 
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a b c a 



Figure 13.4 Computation of mce "abca" "bac" 


any step is always the best possible choice. The proof of this greedy condition is 
left to Exercise 13.10. That means we can rewrite the third clause of mce to read: 

nice (x: xs) (y: ys) =ifx==y then Copy x: mce xs ys else 
minWith cost [Replace x y : mce xs ys, 

Delete x : mce xs (y : ys ), 

Insert y : mce {x: xs) y^] 

The dependency graph for mce "abca" "bac" is pictured in Figure 13.4. There is a 
single diagonal edge when two characters match; otherwise there are three edges. 

It remains to implement a suitable tabulation scheme. As with the knapsack 
problem, we can compute entries row by row from right to left: 

mce xs ys = head (foldr (nextrow x^) (firstrow xs) ys) 

The first row of edit operations is given hy firstrow = tails ■ map Delete. To see how 
to define nextrow, observe that the next edit sequence to be added to the new row, 
say at position i, depends on one of three values: (i) the edit sequence at position 
/ 1 of the new row (for a delete operation); (ii) the edit sequence at position i of 

the previous row (for an insert operation); and (iii) the edit sequence at position 
/ 1 of the previous row (for a replace operation). These last two values can be 

obtained with the help of a zip, so we can define 

nextrow:: [Char] —)■ Char —)■ [[Op]] —)• [[Op]] 
nextrow xs y row = foldr step [Insert y: last row] xes 
where xes = zip3 xs row {tail row) 

step (x, esi,es 2 ) row = if x == y then {Copy x:es 2 ) : row else 
minWith cost [ Replacexy:es 2 , 

Delete x: head row. 

Insert y:es\] :row 
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a b c a 



Figure 13.5 Computation of to "abca" "bacb" 


Finally, cost computations should be memoised for efficiency, but we will leave 
that as an exercise. In the worst case the time required to find the edit sequence 
with minimum cost is then &{mn) steps, where m and n are the lengths of the two 
strings. 


13.4 Longest common subsequence revisited 

The above tabulation scheme for the minimum-cost edit sequence problem can be 
adapted to the longest common subsequence problem. Recall the recursive definition 
of Ics from Chapter 11: 

Ics'.'.Eqa ^ [a] —)■ [a] —)■ [a] 
lcs[]ys =[] 

Icsxs [] = [] 

Ics {x: xs) (y: ys) = if x == y then x : Ics xs ys 

else longer {Ics {x : xs) ys) {Ics xs (y: y^)) 

The dependency graph for Ics "abca" "bacb" is pictured in Figure 13.5. There is a 
single diagonal edge when two characters match; otherwise there are two edges. As 
with mce, we can compute Ics row by row from right to left. This time, the first row 
of entries is a list of empty lists and the entries of a new row each depend on one 
of three further entries, the same three entries as we had with mce. Hence we can 
define 

Ics xs = head -foldr {nextrow xs) {firstrow xs) 
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where 

firstrow xs = replicate {length -|- 1) [ ] 

nextrow xs y row =foldr {step y) [ [ ] ] {zip3 xs row {tail row)) 

step y {x, C5i, CS 2 ) row = if x == y then {x '.CS 2 )'- row 

else longer c^i {head row): row 

The time required to find the longest common subsequence of two lists of lengths m 
and n is &{mn) steps in the worst case, the same time as with the thinning algorithm. 


13.5 The shuttle-bus problem 

Our final example in fhis chapfer is anofher scheduling problem. Consider a shuffle- 
bus that runs from an airport to a city centre. It takes on passengers only at the 
airport, but it can drop them off at various points along the route. We suppose that 
the possible stops are numbered from 0 (the airport) to n (the city centre). In the 
interests of getting all passengers to their destinations as quickly as possible, the bus 
driver is willing to make up to k intermediate stops. The problem is to program the 
computer on board the bus to calculate a schedule of at most k intermediate stops 
that minimises the total cost for a given group of passengers, where the cost to a 
single passenger getting off at stop m is the absolute value of the difference between 
the desired stop number and m. 

All that is important about passengers is the number of them who wish to get off 
at a particular stop, so we define 

type Passengers = [{Count,Stop)] 
type Count = Nat 
type Stop = Nat 

For example, [(3,1), (10,2), (5,3), (15,4), (4,5), (10,8), (22,10)] is a possible pas¬ 
senger list, indicating that three people want to get off at stop 1, ten at stop 2, and so 
on. It is assumed that the passenger list is given in increasing order of stop number 
and that the stops are numbered between 1 and n inclusive. 

A schedule of k intermediate stops is a subsequence ofl,2,...,n — 1 of length at 
most k. However, it turns out to be computationally simpler to describe the journey 
in terms not of stops but of the individual ‘legs’ of the journey, where a leg is a pair 
of stops: 

type Leg = {Stop, Stop) 

For example, the subsequence [2,5,7] of three intermediate stops is represented 
(assuming n = 10) by the sequence of legs [(0,2), (2,5), (5,7), (7,10)]. With this 
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representation, the total cost to a list of passengers for a sequence of legs is defined 
by 

cost:: Passengers —)• [Leg] —)■ Nat 
cost ps[\ =0 

cost ps ((x,y): Is) = legcost qs (x,y) +cost rs Is 

where {qs, rs) = span {atmost y) ps 

where 

atmosty {c,s) =s^y 

legcost ps {x,y) = sum [c x min (s — x) {y — s) \ {c,s) ■(— ps] 

The leg cost of {x,y) is the cost to all passengers who wish to get off affer stop x buf 
before or af stop y. Clearly, fhe closer slop is Ihe one wilh smaller cosl. For example, 
wilh fhe above passenger lisl and sequence of legs, fhe cosl is 

legcost [{'i,l), {10,1)] (0,2) + 

[(5,3), (15,4), (4,5)] (2,5) + 

legcost)] (5,7) + 

legcost [{10,%),{11, \0)] (7,10) 

which is 3 X (2 - 1) + (5 X (3 - 2) + 15 X (5 - 4)) + 0 + 10 X (8 - 7) = 33. In 
parlicular, fhe five passengers who wanl to gel off al stop 3 do besl by walking Ihere 
from slop 2. 

Now we can specify schedule by 

schedule ::Nat —)■ Nat —Passengers —)■ [Leg] 
schedule nkps MinWith {cost ps) {legs nkO) 

where legs relurns Ihe sel of possible sequences of legs from a given position: 

legs v.Nat —)■ Nat —)■ Stop —)■ [ [Leg] ] 
legs nkx 

\x==n =[[]] 

l^==0 =[[(+«)]] 

I otherwise = [{x,y):ls \y [x+l. .n],ls legs n {k — l)y] 

When k = 0, Ihe only possible leg is Ihe one lhal goes slraighl from xton wilhoul 
making any intermediate stops; otherwise every possible leg beginning with x is 
chosen. 

The next step is to obtain a recursive definition of schedule. For the base cases 
we have 

MinWith {costps) {legs nkn) = MinWith {costps) [[]] = [] 

MinWith {costps) {legs nOx) = MinWith {cost ps) [ [ (x, n) ] ] = [ (x, n) ] 
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For the recursive case of legs we can reason 
costps {{x,y): Is) 

= { definition of cost with (qs,rs) = span {utmost y) Is } 

legcost qs {x^y)+ cost rs Is 
^ { assuming cost rs Is ^ cost rs Is' } 

legcost qs {x,y)+ cost rs Is' 

= { definition of cost } 

cost ps {x,y): Is' 

Hence 

cost rs Is ^ cost rs Is' cost ps {{x,y): Is) ^ cost ps {{x,y): Is') 

That leads to the following recursive definition of schedule: 

schedule nkps = process ps kO where 
process ps kx 

I X == n = [] 

I ^ == 0 =[{x,n)] 

I otherwise = minWith (cost ps) [ {x,y):process {cuty ps) (^ — 1) 3 ^ 

I j t— [x-h 1 ..«]] 

cuty = dropWhile {utmosty) 

The next step is tabulation. Let {k,x) represent the call process {cut xps) kx. Then 
{k,x) depends on all of (^ — l,x-|- 1), (^ — l,x-|-2),— l,n). This is a layered 
recursion, so we can turn the problem into one of finding a shorfesf pafh in a layered 
nefwork. Alfernafively, we can build a fable row by row. Suppose we define 

table ps k= [process {cut x ps) kx\x [0 .. n] ] 

In particular, 

schedule nkps = head {table ps k) 

The boffom row of fhe fable is given by 

table ps0= [ [ (x, n) ] | x ■(— [0.. n — 1 ] ] -H- [ [ ] ] 

If remains fo show how table ps k is computed from table ps {k—1). The idea is fo 
define step so fhaf 

table ps k = step {table ps {k — l)) 

To fhis end, lef ptails refurn fhe proper fails of a lisf, fhaf is, all fhe fails excepf fhe 
lisf ifself: 

ptails [ ] = [ ] 

ptails (x: x^) = xs: ptails xs 


Then we can define 
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step t = zipWith entry [0..n — \] {ptails f) 4f [ [ ] ] 

entry X ts = minWith (cost {cutxps)) {zipWith (:) [{x,y) \ y ^ \x + l. .nWts) 
Putting these pieces together, we arrive at the final algorithm 

schedule nkps = head {apply k step start) 

where 

start = I[0-~ 1]]-H-[[]] 

step t = zipWith entry [0.. n — 1 ] {ptails f) 4f [ [ ] ] 

entry xts = minWith {cost {cut xps)) 

{zipWith (:) [{x,y) \y [x + \ . .n]\ts) 

The algorithm can be made more efficient in various ways, including by memoising 
cost, but we will leave these optimisations as exercises. 


13.6 Chapter notes 

The story behind the term ‘dynamic programming’ is described in [4]. An early 
account of dynamic programming by Bellman appears in his book [1]. Various 
tabulation schemes for recursive programs are presented in [2]. The minimum edit 
distance problem has applications in computational biology and is discussed in 
most books on stringology, including [5]; see also [ 6 , 8 ]. The shuttle-bus problem 
appears, in different guises, in [3] and as an elevator problem in [7]. The Wikipedia 
entry on dynamic programming contains a wealth of other examples. 
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Exercises 

Exercise 13.1 Let T{n) denote the number of additions in computingfrom its 
recursive definition. Given thatjfifr n = 0((p”), where 9 = (1 + \/5)/2 is the Golden 
Ratio, prove that T{n) = &((p"). 

Exercise 13.2 Give an efficient one-line definition of the function that returns 
the inhnite list of all the Fibonacci numbers. 

Exercise 13.3 The following two identities hold for n>2: 

fib (2 X n) =fib nx (2 xfib (n + 1) —fib n) 
fib (2 X n + 1) =fib n xfib n +fib {n + 1) xfib {n + 1) 

Using these facts, show how to computeyifr n in C?(log n) steps. As a hint, note that 
the linear-time algorithm for fib can be phrased in the form 

fib =fst -foldr step ( 0 , 1 )- unary 

viherestepk {a,b) = {b,a + b) 
unary n= [1 . .n] 

The logarithmic version is obtained by modifying the definition of step and replacing 
unary by binary, where binary returns the binary expansion of a number, least 
significant digit first: 

binary n = ifn == 0 then [ ] else r: binary q 
where {q, r) =n divMod 2 

For example, binary 6 = [0,1,1]. 

Exercise 13.4 Consider the function 
fob '.'.Nat —)■ Integer 

fob n = ifn ^ 2 then fromintegral n else fob (n — l) +fob (n — 3) 

Show how to evaluate/ofj in linear time. 

Exercise 13.5 The Stirling numbers can be defined for 0 ^ r ^ n by fhe recurrence 

Stirling:: {Nat,Nat) —)■ Integer 
Stirling {n,r) 

I r == n =1 
I r == 0 =0 

I otherwise = fromintegral r x Stirling {n — f,r) + Stirling (n — 1, r — 1) 
Give a suitable tabulation scheme for computing Stirling efficiently. 
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Exercise 13.6 An extreme form of dependency graph arises when every value 
depends on every previous value, as in the function/, where 

/n = ifn==0 then 1 else sum {mapf [0.. n — 1 ]) 

How would you compute this particular recursion efficiently? 

Exercise 13.7 Why does the monotonicity condition 

value srii ^ value sn 2 ^ value {add i sni) ^ value {add i sn 2 ) 
hold for the knapsack problem? 

Exercise 13.8 Here is a solution to the knapsack problem based on a one-dimensional 
array, similar to the one in the text except that values in each row go from left to 
right: 

swag w items = alw 

where 

a =foldr step start items 

start = listArray (0,^) {replicate (w+ 1) ([],0,0)) 
step item a = ... 

Your task is to define step. 

Exercise 13.9 Write down all the possible values of mce "abca" "bac". 

Exercise 13.10 The purpose of this exercise is to establish the greedy condition for 
the minimum-cost edit problem by showing that, at any point in the sequence, if 
the two remaining strings begin with the same character, then starting with a copy 
operation always leads to a best possible solution. Suppose a best sequence does 
not begin with a copy, so it has to begin with either a delete or an insert (a replace is 
not possible as the first two characters are the same). The two situations are dual, so 
suppose it begins with k delete operations, where k>0. Thereafter, there are three 
possibilities for the next edit operation: a copy (if available), a replace, or an insert. 
What alternative edit sequence beginning with a copy and with the same cost is 
possible in the first two cases? What alternative sequence beginning with a copy is 
possible in the third case? The necessary assumption is that c ^ r ^ d + i, where c 
is the cost of a copy, r the cost of a replace, d the cost of a delete, and i the cost of 
an insert. 

Exercise 13.11 We can memoise cost computations in the final algorithm for the 
edit sequence problem by pairing edit sequences with their costs. In particular, let 
us introduce 
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type Pair = {Nat, [Op]) 

Now the first row is defined by 

firstrow:: [Char] [Pair] 
firstrow xs =foldr nextentry [ (0, [ ]) ] 

where nextentry x row = cons {Delete x) {head row ): row 

where cons is defined by 

cons op {k, es) = {ecost op + k,op: es) 

Wrife down the modified definition of nextrow (hint: it also uses cons) and hence 
construct a new definition of nice. 

Exercise 13.12 By definition a distance function d::AxA^ M+, where M+ is the 
set of nonnegative real numbers, is a function with the following four properties: 

1 . d{x,y) ^ 0 . 

2 . d{x,y) = 0 if and only ifx = y. 

3. d{x,y) = d{y,x). 

4. d{x,y) ^d{x,z) + d{z,y) for all z. 

Show that dist is a distance function, where 

dist {xs,ys) = cost {mcexsys) 

For this reason the minimum-cost edit sequence problem is often referred to as the 
edit distance problem. 

Exercise 13.13 Let k denote the length of the longest common subsequence of xs 
and ys. Show that 

cost {mce xs ys) ^ length xs + length ys — 2xk 
given that the only edit operations allowed are copy, insert, and delete, with costs 0 , 
1, and 1, respectively. Show that the inequality can be strengthened to an equality. 

Exercise 13.14 It may seem in the light of Exercise 13.10 that a minimum edit 
sequence can be obtained from a longest upsequence in the following way: partition 
the two sequences according to their longest upsequence, giving 

X5o -H- [.^o] +!-••• -A-XSn-l -H- [Xn-l]-H-XSn 

-H- [-^0] -H- AySn_iPr[xn-i]PrySn 

where [xq, ...,x„_i ] is the longest common subsequence. For example, 

"bdacb" = "b"-ff "d"-ff "" qf "a"-ff "c"-ff "b" H-f "" 

"ddacc" = "" -if "d"-if "d"-if "a"-if "c"-if "" -ff'c" 

The sequences xsj and ysj have no characters in common, so their minimum edit 
sequence can be determined by applying as many replace operations as possible, 
followed by either a number of deletes or a number of inserts. Does this idea work? 
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Exercise 13.15 To define an efficient version of the shuttle-bus function schedule 
we need to split up the passenger list by defining 

split nps = [cut 0 ps,cut 1 ps,...,cut n ps] 

Give a definition of split. 

Now we can memoise cost computations by defining 

schedule :: Nat —)■ Nat —)■ Passengers —)■ [Leg] 
schedule nkps = extract {apply k step start) where 
extract = snd • head 

start = zipWith entry pss [0.. n — 1 ] -H- [ (0, [ ]) ] 

where entry ps x= {legcost ps {x,n),[{x,n)\) 
pss = split n ps 
step t = ... 

Each sequence of legs is paired with its cost. Define the local value step t. 
Exercise 13.16 Recall the coin-changing problem of Section 7.3: 

mkchange:: [Denom] —)■ Nat —)■ Tuple 
mkchange ds ■(— MinWith count ■ mktuples ds 
where count = sum and 

mktuples [\]n = [[?t]] 

mktuples {d: ds) n = concat [ map (c:) {mktuples ds {n — cxd)) 

I c ^ [ 0 . .n div r/]] 

What is the monotonicity condition that yields a recursive definition of mkchange! 
Write down the recursive definition and suggest a suitable tabulation scheme. 


Answers 

Answer 13.1 We have r(0) = r(l) = 0 and T{n) = T{n — \ ) + T{n — 2) + \ for 
n ^ 2. By induction one can then show T{n) =fib {n + \) — \. Since, fib n = 0((p"), 
where tp is the Golden Ratio, the result now follows. 

Answer 13.2 We have 
fibs :: [Integer] 

fibs = 0:1 :zipWith {+)fibs {tailfibs) 

Answer 13.3 We have 

fib =fst -foldr step ( 0 , 1 )- binary 

where step k {a, b) =iik == 0 then (c, d) else {d, c + d) 
where c = ax {2 xb —a) 
d = a X a + b xb 
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Answer 13.4 The solution is the same as the one for fib except that we maintain 
three values at each step: 

fob n =fst3 {apply n step (0,1,2)) 

where {a,b,c) = {b,c,a + c) 
fst3 {a,b,c) = a 

Answer 13.5 The simplest solution is to use a two-dimensional array: 

Stirling {n,r) = a \ {n,r) 

where a = tabulate/ (( 0 , 0 ), (n,r)) 

/ {hi) I i ==j = 1 

|y ==0 =0 

I otherwise = fromintegral j x a \ {i — Ifi) + a \ {i — Ifi — 1) 

As an alternative we can go for a solution with the same shape as binom: 

Stirling {n,r) = head {apply {n — r) step {replicate (r+ 1) 1)) 
where step row = scanrl (+) {zipWith (x) [r',r' — 1. .0] row) 

/ = fromintegral r 

This method computes row (n,r), (n — l,r — 1),(n — r,0) from the previous row 
(n — l,r), (n — 2 ,r — 1 ),(n — r — 1 , 0 ), starting with the row (r, r),( 0 , 0 ), all of 
whose entries are 1 . 

Answer 13.6 A trick question, because / n = 2”^^ for n ^ 1. 

Answer 13.7 Because value {add i sn) = value i +value sn. 

Answer 13.8 The definition is 

step item a = a jj [ij^next j item) |y t— [0.. w] ] 

where next j i = if j < wi then a ! j else better {a ! j) {add i {a\{j — wi))) 
where wi = weight i 

The array-hased solution has the same asymptotic complexity as the list-hased 
version hut is slightly slower. 

Answer 13.9 There are three answers, all with cost 6 : 

abca* *abca ab*ca 
*b*ac ba*c* *bac* 


Answer 13.10 In the first two cases we can start with a copy and k deletes, to 
arrive at the same spot as k deletes followed hy either a copy or a replace. Since 
c^r^c + kd^kd + r, the first sequence gives an edit sequence with smaller 
cost. In the third case we can start with a copy and k—\ deletes. This time, we have 
c ^ d + i ^ c + {k — \ ) d ^ kd + i, ^o again the first sequence has smaller cost. 
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Answer 13.11 The definition is basically the same as before except that (:) is 
replaced by cons: 

nextrow :: [Char] —)■ Char —)■ [Pair] [Pair] 

nextrow xs y row =foldr step [cons {Insert y) {last row) ] xes 

where 

xes = zip3 xs row {tail row) 

step {x,esi,es 2 ) row = if x == y then {cons {Copyx) es 2 ): row else 
minWithfst [cons {Insert y) esi, 

cons {Replacexy) es 2 , 

cons {Delete x) {head row) ]: row 

Now we have 

mce xs ys = extract (foldr {nextrow xs) (firstrow xs) ys) 
where extract = snd ■ head 

Answer 13.12 The first two properties are immediate. Changing deletes into inserts 
and vice versa, and swapping the arguments of a replace, we get an edit sequence 
of the same cost for changing the second list into the first, so the third property is 
satisfied. For the fourth and final property we have 

cost {nice xs ys) ^ cost {nice xs + cost {nice zs ys) 

because we can concatenate a minimum edit sequence turning xs into zs with a 
minimum edit sequence turning zs into ys to get an edit sequence turning xs into ys. 

Answer 13.13 Let zs be a longest common subsequence of xs and ys. Construct an 
edit sequence that deletes all elements of xs not in zs, inserts all elements of ys not 
in zs, and copies the common elements. The cost of this edit sequence is at most 

{length xs — length zs) + {length ys — length zs) 

and so a minimum cost edit sequence is also bounded by this quantity. To show 
equality we have to prove there is no cheaper edit sequence. Given a minimum 
sequence es, consider the string zs of length k that results from performing all the 
deletes in es on xs. Since ys can be constructed from zs by applying insertions alone, 
it follows that zs is a common subsequence of xs and ys. Hence 

cost {nice xs ys) ^ length xs + length ys — 2xk 
Since a longest common subsequence has length at least k, equality is established. 
Answer 13.14 No, not as stated. For the example strings we would obtain 

b d * a c b 
* d d a c c 
202003 

with cost 7. But a better edit is given by 
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b d a c b 
d d a c c 
30003 
with cost 6. 

Answer 13.15 One sensible way of defining split is as follows: 

split nps = scanlopps[l. .n] 

where op qs x = dropWhile {utmost x) qs 

The definition of step is 

step t = zipWithS entry pss [0.. n — 1 ] {ptails t) 3+ [ (0, [ ]) ] 
entry ps xts = minWithfst {zipWith cons [x+l ..n] ts) where 

consy {c,ls) = {legcost {takeWhile {utmost y) ps) {x,y) + c, {x,y): is) 

Answer 13.16 The monotonicity condition is that 

count csi ^ count CS 2 ^ count (c: C 5 i) ^ count {c: CS 2 ) 

That means 

(c:) • MinWith count t— MinWith count ■ mup (c:) 

Hence we can define 

mkchunge[l]n = [n] 

mkchunge {d : ds) n = minWith count [ c: mkchunge ds {n — c x d) 

I c ^ [0. .n div d]] 

Let {k,n) denote the call mkchunge {drop k ds) n. Then {k,n) depends on all of 
{k — \,n),{k —l,n — d), {k — l,n — 2d),.... The recursion is layered, so we can use 
the layered network algorithm for the tabulation scheme. Alternatively, one can 
compute the partial solutions row by row. 
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A surprising variety of subtly different algorithms arises from the single idea of 
trying to bracket an expression Ai (g) A 2 (8* • • • <8* in the best possible way. We 
assume that (81 is an associative operation, so the manner in which the brackets are 
inserted does not affect the expression’s value. However, different bracketings may 
have different costs, and the aim of the exercise is to find one whose cost is as small 
as possible. Depending on how the cost is defined, finding the best solution may 
take constant, linear, linearithmic, quadratic, or cubic time. 

Here is a simple example; others will be given later on. Take (g) to be matrix 
multiplication, an associative operation but not in general a commutative one. The 
cost of multiplying apxq matrix by a ^ x r matrix is f?(p x ^ x r) additions and 
multiplications, and the result is a p x r matrix. Now consider the four matrices 
X\,X 2 ,X'i,X 4 with the following dimensions: 

(10,20), (20,30), (30,5), (5,50) 

With the cost taken as exactly p x q xr, the five possible ways of bracketing the 
four matrices have costs 47500, 18000, 28500, 6500, and 10000, the best one being 
{Xi (g) (A 2 (g) A 3 )) (g) A 4 with cost 

20 X 30 X 5 + 10 X 20 X 5 +10 X 5 X 50 = 6500 

There is no obvious method for bracketing the matrices to achieve minimum cost. 
Greedy strategies, like doing the cheapest (or most expensive) multiplication first, do 
not work. However, as we will see later on, there is a fairly straightforward dynamic 
programming algorithm to compute the best bracketing, one whose running time is 
steps for n matrices. 

The right way of phrasing the bracketing problem is simply to ask for a leaf- 
labelled binary tree of minimum cost with a given list as fringe. Each bracketing 
corresponds to a particular tree. For simplicity, the elements at the leaves of the tree 
are taken to be the sizes of the objects to be bracketed, not the objects themselves. 
We will refer to such sizes as weights to avoid confusion with the use of size to 
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describe the number of nodes in a tree. Problems like this were considered in 
Chapter 8. In particular we looked at Huffman coding, which can be regarded as a 
version of optimal bracketing in which (g) is assumed to be commutative as well as 
associative, so the fringe can be any permutation of the given list. In this chapter 
we will also tackle the restricted version of Huffman coding without commutativity, 
in which the fringe has to be exactly the given list. Another example that can be 
solved using the techniques of this chapter is to find an optimum binary search tree, 
a problem we will tackle in Section 14.5. 


14.1 A cubic-time algorithm 

The problem is based on the following data type of binary trees: 

data Tree a = Leaf a \ Fork {Tree a) {Tree a) 

We will refer to such trees as leaf trees to avoid confusion with other kinds of tree 
we will need later on. A leaf tree will be displayed using parentheses. For example, 
the leaf tree 



is displayed as (((5 6) 7) ((1 (2 3)) 4)). 

Given Weight as the type of weights, the function met determines a tree with 
minimum cost: 

met :: [Weight] —)■ Tree Weight 
met t— MinWith eost ■ mktrees 

The function mktrees returns a list of all possible trees with a given fringe. Two 
definitions were given in Chapter 8, but here is another way: 

mktrees:: [a] —)■ [Tree a] 
mktrees [w] = [ Leaf w] 
mktrees ws = [ Fork t\ t 2 

I {us, vs) t— splitsn ws, t\ t— mktrees us, t 2 ^ mktrees vi] 

The function splitsn (see Exercise 8.7) splits a list of length at least two into two 
nonempty lists in all possible ways. The above recursive definition directs us towards 
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a dynamic programming solution, while the inductive definitions of Chapter 8 
suggest heading for a greedy or thinning algorithm. In any case, as mentioned in 
Exercise 8 . 8 , there are 
/2n-2\ 1 
\n — 1 J n 

trees with a fringe of length n, so all definitions of mktrees take exponential time. 

It remains to define cost. There is a range of possible definitions, but we will only 
consider cost functions that conform to the following general scheme: 

type Cost = Nat 

cost :: Tree Weight —)■ Cost 
cost {Leaf w) =0 

cost {Fork t\t 2 )= cost t\ + cost t 2 +f {weight ti) {weight t 2 ) 

weight :: Tree Weight —)■ Weight 

weight {Leaf w) =w 

weight {Fork ti t 2 ) = g {weight ti) {weight t 2 ) 

Thus the cost of forming a tree is the sum of the costs of forming its two component 
subtrees plus some function/ of their weights. The weight of a leaf is the value at 
the leaf, while the weight of a fork is some further function g of the weights of its 
component trees. For the matrix multiplication problem we have the definitions 

type Weight = {Nat,Nat) 
f :: Weight —)■ Weight —)■ Cost 
f (p,q) {q\r) \ q==q'=pxqxr 
g:: Weight —)■ Weight —)■ Weight 

g ip^q) {q',r) \q==q' = {P,r) 

Other interesting instantiations of / and g will be given later on. We suppose 
throughout that g is an associative operation, so two trees have the same weight 
if they have the same fringe. This fact alone is sufficient for us to write down a 
recursive definition of met and obtain a cubic-time solution by tabulation. Since the 
weights of all trees in mktrees ws are the same, we have 

cost u\ ^ cost U 2 A cost vi ^ cost V 2 ^ cost {Fork mi vi) ^ cost {Fork U 2 V 2 ) 

where ui and U 2 are trees in mktrees us, and vi and V 2 are trees in mktrees vs. That 
means we can refine met to read 

met [w] = Leaf w 

met ws = minWith cost [Fork {met us) {met vi) | {us, vs) •(— splitsn wi] 

Assuming cost takes constant time, the running time T{n) of this version of met for 
a fringe of length n satisfies 
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n—1 

T{n) = Y,{T{k) + T{n-k))+@{n) 

k=\ 

with solution T{n) = 0(3”) (see Exercise 14.2). 

The next task is to find some suitable tabulation scheme. As a first step we can 
make the computation more efficient by encoding a tree as a triple of values, the 
cost of the tree, its weight, and the tree itself: 

type Triple = {Cost, Weight, Tree Weight) 

With cost, weight, and tree now returning the first, second, and third components of 
a triple, we have 

met :: [Weight] —)■ Tree Weight 
met = tree ■ triple 

triple :: [Weight] —Triple 
triple [w] = {0,w,Leaf w) 

triple ws = minWith cost [fork {triple us) {triple vs) [ {us, vs) ■(— splitsn wi] 
fork :: Triple —)■ Triple —)■ Triple 

fork {ci,w\,ti) {c2,W2,t2) = (ci +C 2 +/wi W 2 ,gwi W 2 ,Forkti 12 ) 

The simplest way of implementing a tabulation scheme is to use a two-dimensional 
array. The idea is to store the values of 

table:: {Int,Int) —)■ [Weight] —)■ Triple 
table {i,j) = triple ■ drop (/ — !)• take j 

in an array. The value of table {i,j) is the solution when the inputs are the elements 
of the segment Wi,Wi-i-i,...,Wj for 1 ^ i ^ n. The array-based algorithm takes 
the form 

met ws = tree {table {l,n)) where 
n = length ws 

weights = listArray {l,n)ws 
table {i,j) 

I i ==j = {0, weights ! i,Leaf {weights ! /)) 

I / <7 = minWith cost [fork {t ! (/, k)) {t\{k+l,j)) [ k t— [/• .7 — 1 ]] 

t = tabulate table {{l,\),{n,n)) 

The function tabulate was defined in the previous chapter: 

tabulate:: Ix i ^ (/ —)■ e) —)■ (/, i) —)■ Array i e 

tabulatef bounds = array bounds [{x,f x) [ x ■(— range bounds] 

New entries to table are computed by looking up other entries in the array. Another 
array, weights, is used solely for quick access to the given weights. 
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Assuming/ and g take constant time, it takes & (j — i) steps to compute entry (ij) 
of the array for i ^j, so the total time T{n) is given by 

Tin) = ££©(/■-/) = &in^) 

i=ij=i 

In summary, the above tabulation scheme requires cubic time and quadratic space. 
The only assumption we made was that the function g for combining weights was 
associative. We can do better if we suppose more about/ and g, and that is the topic 
of the following section. 


14.2 A quadratic-time algorithm 

It is possible to shave a factor of n off the running time if we make some more 
assumptions about/ and g. Let r(/j') denote the location of the first best split for 
the segment w;, of the input. That is, if r = r(/j), then r is the smallest integer 
in the range i ^r<j such that (w,... Wr) {wr+i ... Wj) is the top-level split in a best 
bracketing. We focus on the smallest r because our standard definition of minWith 
happens to return the smallest best split, but the result below also holds when r 
is the largest position for a best split. In either case, r(/, i) is undefined because a 
single value cannot be split. However, we can set r(/, i) = i for completeness. 

The result we want to prove is that, under certain conditions on / and g, the 
function r is monotonic; in symbols, 

riij-^) ^ riij) ^ r{i+\J) (14.1) 

for i <j. The proof is postponed to Section 14.4. That means we can revise the 
tabulation of met to manipulate quadruples of values, the first component of which 
records the position of a best split. With cost, weight, and tree now returning the 
second, third, and fourth components of a quadruple, and root returning the first, 
we have 

met ws = tree {table (l,n)) where 
n = length ws 

weights = list Array {l,n)ws 
table (iJ) 

I i ==j = {i, 0 , weights ! i,Leaf [weights ! /)) 

\i + l ==j =fork i [t ! (/, /)) [t ! (jj)) 

I / -|- 1 <7 = minWith cost [fork k [t ! (/, k)) {t\{k+l J)) 

r{i,j) = root [t ! (/,;)) 
t = tabulate table ((1,1), (n,n)) 

forkk{_,ci,wi,h) i-,C2,W2,t2) = ik,ci+C2+fwi W2,gwi W2,Forkt\ t2) 
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The case i + 1 =j has to be treated separately (see Exercise 14.4). The monotonicity 
of r is exploited in the third clause defining table. It is immediate from the definition 
that it takes 0(r(/ + 1 J) — r{ij — 1)) steps to compute entry (/J) of the table when 
/ + 1 < j. The total time T{n) needed to compute entry (l,n) can be estimated by 
counting the cost of computing each entry of the table along each diagonal d, where 
d=j-r. 

n—[n—d 

T{n) = &(n) + ^ ^ 0(r(/+ l,i + d) —r(i,i + d— 1)) 

d=2 i=l 

The first two diagonals, d = 0 and d = 1, can be computed in &(n) steps. We have 

n—d 

^ r{i +\,i + d) — r(/,/ + d— 1) = r{n — d+ l,n) — r{l,d) = &{n) 
i=l 

since i ^ r{i,j) ^j. That gives T{n) = &{n^). 

It remains to give the conditions on/ and g that ensure (14.1). In Section 14.4 we 
will show that (14.1) follows from the quadrangle inequality (QI) 

=> C{ij) + C{i'j') ^C{ij') + C{i'j) (14.2) 

where C{iJ) is the minimum cost of bracketing the segment w/, ...,Wj of the input 
wi,..., w„. In words, the sum of the costs for two overlapping intervals is at most the 
cost of the union of the intervals plus the cost of their intersection. The conditions on 
/ and g are those required to ensure the quadrangle inequality holds. The simplest 
conditions are when / and g are the same function. It is possible to formulate 
conditions when/ / g (see Exercise 14.9), but they are rather complicated and it is 
difficult to find examples fo satisfy them, so we will consider only the case/ = g. 

There are two conditions. Setting/ = g = (•) for readability, the first condition 
is that the quadrangle inequality should also hold for the weight function 

WiiJ) = wfWi+i»---»Wj 

Thus 

i ^ i' ^ WiiJ) + W{i'j') ^ Wii,f) + W{i'j) 

The second condition is that W should be monotonic in the sense that 

The monotonicity condition can be simplified (see the exercises) to read 
A^A»B A B^A»B 

for all weights A and B. Similarly, the quadrangle inequality can be simplified to 
read 


(A«B) + (B*C) ^ (A«B*C)+B 
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for all weights A, B, and C. For example, with (•) = (+), the monotonicity and 
QI conditions are immediate, provided sizes are nonnegative. With (•) = (x), the 
QI condition isO^B(A — 1)(C — 1), which holds if all weights are positive. The 
monotonicity condition also holds if all weights are positive. However, the QI 
condition fails for (•) = max, and the mono tonicity condition for (•) = min. 

We emphasise that these conditions are sufficient for an 0{n^) algorithm, hut not 
necessary conditions. Proof of (14.1) is a little complicated; for now we just accept 
the result and move on to examples. 


14.3 Examples 

So far, we have a cuhic-time algorithm if g is associative, and a quadratic-time one 
if/ = g and the monotonicity and QI conditions are satisfied. But it is also possible 
to have a linear-time, or even constant-time, algorithm, depending on the values of 
these two functions. To appreciate the range of possibilities, we will now look at a 
number of instructive examples. 

Concatenation. The first example, an old friend, concerns the best way to concate¬ 
nate a list of lists. It takes m steps to concatenate a list of length m with a list of 
length n and the result is a list with length m + n,so we can take/ = (<C), where 
m n = m, and g = (-f). That means there is a cubic-time algorithm to determine 
the best way of concatenating lists of lists. 

However, there is a much simpler method. Suppose the lists are xsj for 1 ^n. 

The minimum possible cost of carrying out the concatenation is given by 
where xj is the length of xsj. To see this, observe that each element of xsj for 1 ^j<n 
has to be concatenated with some list to its right, contributing at least xj to the cost. 
The minimum cost can be achieved by bracketing from the right. In other words, the 
standard definition concat =foldr (-H-) [] is, as expected, the best possible way to 
concatenate a list of lists. One can regard this solution as taking constant time since 
no work has to be done in finding the best bracketing. Of course, it takes linear time 
actually to build the tree. Essentially the same result, namely that bracketing from 
the right is optimal, holds for/ = (<C) and g = (x). The dual result, namely that 
folding from the left is optimum, holds if/ = (^), where m^n = n. 

Adding numbers. Next, what is the best way of adding a list of decimal integers 
together? Integer addition is commutative as well as associative, but we will ignore 
this fact in what follows. Here the problem is to compute Y!k=iXk, where Xk is an 
integer with dk digits. We will suppose that adding an m-digit integer to an n-digit 
integer takes (m min n) steps and yields an integer of size {m max n). Thus/ = min 
and g = max. These estimates are not quite accurate for integer addition, owing 
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to possible carries, so the claim below holds only when no carries are involved in 
any addition. Since g is associative, there is a cubic-time solution. However, there 
is a simple constant-time solution: any way of bracketing the additions is as good 
as any other. We claim that the cost of any bracketing is the sum of the lengths 
of the integers minus a maximum length. More precisely, let S(i,j) = Y!k=i^k and 
M{iJ) = Max^^^. dk- Then we claim that the cost C(l,n) of adding n numbers is 
C(l,n) = 5(1,n) —M(l,n), irrespective of the bracketing. The proof is by induction. 
For the base case we have 

C(l,l)=0 = 5(l,l)-M(l,l) 

since the cost of performing no additions is zero. For the induction step, we have 
C(l,n) 

= { assuming an initial split at position 7 } 

C( 1 ,;•) + C( 7 - + 1, n) + (M( 1, 7 ) min M(/- + 1, n)) 

= { induction } 

5 ( 1 , 7 ) -M(1,7) +S{j+l,n)-M(j+l,n) + mmM(j+l,n)) 

= { definition of 5 } 

5( 1 , n) - M( 1 , 7 ) - M(/-+ 1 , n) + (M( 1 , 7 ) min M( 7 -+ 1 , n)) 

= { arithmetic: x-fy = x min y-|-x max y } 

5(1,n) — (M( 1 , 7 ) maxM(/-|- l,n)) 

= { definition of M } 

5(l,n)-M(l,n) 

All ways of summing the numbers therefore have the same cost, so the way the 
brackets are inserted is immaterial. It therefore takes no work to find fhe solution, 
fhough of course again if lakes linear lime lo build fhe free. 

Multiplying numbers. Nexl, consider fhe cosl of multiplying a lisl of decimal num¬ 
bers logelher, assuming lhal multiplying an m-digil number by an n-digil number 
lakes exaclly m x n multiplications and gives an answer of lenglh m + n. Thus 
/ = (X) and g = (-f). Again, Ihese eslimales are nol quife accurate for inleger mul¬ 
tiplication, owing lo possible carries. As in Ihe case of addition, we can improve on 
Ihe cubic-lime algorilhm because any way of bracketing Ihe multiplications has Ihe 
same cosl as any olher. To define Ihis cosl, lei S{iJ) = and Q{iJ) = 

Then Ihe common cosl is given by 

C(l,n) = (5(l,n)2-e(l,n))/2 

The proof is by induction. The base case is 

C(l,l)=0 = (5(l,l)2-e(l,l))/2 

The induction step is 
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Figure 14.1 The Amoeba Fight Show 


C(l,«) 

= { assuming an initial split at 7 } 

CilJ) + C(j+l,n)+SilJ)S(j+l,n) 

= { induction } 

{Siljf-Qil,j)+S(j+\,nf-Q(j+l,n))/2 + S{lJ)S(j+l,n) 

= { arithmetic: + 3 ^^)/ 2 +xj = (x+ 3^)^/2 } 

{S{\,nf-Q{\,n))/2 

Therefore the multiplications can he performed in any order. This is also a constant¬ 
time solution. 

Multiplying matrices. As we have seen, the situation changes when the objects to 
be multiplied are matrices, not numbers. In this case the function r is not monotonic. 
For example, take the four matrices Mi,M 2 ,M 3 ,M 4 with dimensions 2 x 3, 3 x 2, 
2x10, and 10x1, respectively. As can easily be verified, the best order to compute 
Ml M 2 M 3 is to parenthesise it as (Mi M 2 ) M 3 with root 2, while the best way to 
compute Ml M 2 M 3 M 4 is to parenthesise it as Mi (M 2 (M 3 M 4 )) with root 1. That 
means that only the cubic-time dynamic programming solution applies. In fact there 
is an 0{n log n) solution for the matrix multiplication problem (see the chapter 
notes) but it is too complicated to be described here. 

Amoeba fight show. This example owes its setting to [13]. Imagine a line of canni¬ 
balistic amoebae, each separated from its neighbour by a sliding door, as shown 
in Figure 14.1. Removing a door enables two neighbouring amoebae to fight. The 
winner of the fight is always the heavier amoeba, which absorbs its lighter compan¬ 
ion, increasing its weight in the process. The duration of the fight is proportional to 
the weight of the lighter amoeba. At the end of all the fights is a single fat amoeba 
whose weight has been increased by the sum of all the losers. The compere wants 
the show to be over as quickly as possible, for fast audience turnover. What is the 
best way of arranging the fights, that is, the best order for removing the sliding 
doors? 

More prosaically, we seek an optimum bracketing where the relevant definitions 
are/ = min and g = (-f). There is therefore a cubic-time algorithm for the problem. 
However, we can put a lower bound to the cost of a show: each amoeba except one 
has to lose its life. That means the minimum cost is at least the sum of the weights 
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of all the amoebae exeept a largest one. This bound ean be aehieved by the simple 
expedient of letting a heaviest amoeba fight at eaeh step. The solution is not unique, 
for the two fights ((((3 6) 2) 1) 5) and (3 (((6 2) 1) 5)) both have minimum 
eost 11. One method for eonstrueting a best braeketing is given by 

met xs =foldr Fork e (map Leaf ys) 

where e =foldl Fork {Leaf z) {map Leaf zs) 

{ys,z'. zs) = span (/ maximum xs) xs 

We split a sequence into those elements before a (first) maximum value and those 
afterwards. For example, [3,6,2,1,5] is split into the two component lists [3] 
and [6,2,1,5]. Note that the first element of the second list is a maximum ele¬ 
ment. The two lists are combined by folding from the left the elements in the 
second list, and then folding the result from the right with the first list. This al¬ 
gorithm takes linear time. But it is not an example of a greedy algorithm, at least 
not one built on the inductive definition of mktrees in Chapter 8. For example, 
there are three trees over [2,1,7,3] with minimum cost, namely (2 (1 (7 3))), 
(2 ((1 7) 3)), and ((2 (1 7)) 3), but none of them can be extended to the unique 
solution ((((9 2) 1) 7) 3) for the fringe [9,2,1,7,3]. 

Restricted Huffman coding. The cost function for Fluffman coding is given by 
Y!j=\Xjdj, where dj is the depth of the leaf containing Xj. As we saw in Section 8.2, 
the same cost function is given by taking/ = g = (-I-) in the optimum bracketing 
version of the problem. In the restricted version of Fluffman coding the fringe of the 
final tree has to be exactly the list of elements in the input. In Huffman’s algorithm 
the pair whose joint weight is the smallest is combined at each step, but that idea 
does not work for the restricted version. For example, the best tree for the fringe 
[10,13,9,14] is ((10 13) (9 14)), whose cost is 92, but combining the smallest 
pair at each step would lead to the tree ((10 (13 9)) 14) with cost 100. Other ideas, 
like choosing a split that best equalises the sum of weights in each half, also do 
not work. However, the monotonicity and quadrangle inequality conditions hold, 
so there is a quadratic-time algorithm for the problem. There is also another, quite 
different algorithm for this particular instance, the Garsia-Wachs algorithm, which 
we will discuss in Section 14.6. The Garsia-Wachs algorithm can be implemented 
to take 0{n log n) steps for an input of length n. 

Cartesian sums. Consider the associative operation 0 defined by 

X50y5 = [x + y I X ■(— xs,y •(— y^] 

This function arose in Section 5.5 in connection with sorting. The cost of computing 
0 on two lists of lengths m and n is mxn additions, and the result is a list of length 
mxn, so we have/ = g = (x). The problem is to combine a list of nonempty lists 
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of numbers with 0 . As we have seen, the monotonicity and quadrangle inequality 
conditions are satisfied for this problem, provided each list has a positive length, so 
a best bracketing can certainly be found in quadratic time. 

Boustrophedon product. Finally, consider an operation known as the boustrophe- 
don product of two lists. Some combinatorial generation algorithms involve running 
up and down one list in between generating successive elements of another list, 
rather like the shuttle on a loom or an ox ploughing a field. The word boustrophedon 
means ‘ox-turning’ in ancient Greek. The boustrophedon product (Tf) of two lists 
can be defined by 

((+f)) •• [^] [^] [^] 

[] {-^-)ys = ys 

{x: xs) (+I-) ys = ys-{\-x'. {xs (Tf) reverse ys) 

For example 

[3,4] (0-) [0,1,2] =[0,1,2,3,2,1,0,4,0,1,2] 

"abc" (4F) "xyz" = "xyzazyxbxyzczyx" 

The function (4F) is associative, though this fact is not obvious. So, what is the 
best way of computing the boustrophedon product of a list of lists? The cost of 
computing (4F) for two lists of lengths m and n is proportional to the length of 
the result, namely m + mxn + n. Thus/ = g = (•), where m»n = m + mxn-\-n. 
The monotonicity and quadrangle inequality conditions hold for this problem, so 
there is a quadratic-time algorithm for computing the best way of bracketing the 
boustrophedon product of a list of lists. 


14.4 Proof of monotonicity 

This section is devoted solely to the proof of (14.1). The result can be restated in 
the form 

r{ij) ^ r(/,; 0 1) and r{ij) ^ r{i + 1J) (14.3) 

where r{ij) is the smallest integer in the range i^r<j for which the best bracketing 
for begins with the split (w,-...w^) (w^+i... wy). 

Let C{iJ) denote the minimum cost of bracketing Wi,...,Wj, and W{i,j) the weight 
of the resulting expression. Thus W{i,j) = Wi*Wi^\ ■ • -wy-i •Wj, where/ = g = (•). 
Define Ck{iJ) for / ^ k <y by 

Q(/,y) = C(/, k) + C{k + l J) + WiiJ) 

Thus Ck{i,j) is the cost of the bracketing {wi,...,Wk) {wk+i,...,Wj). Now (14.3) 
follows from the assertion that, if r is the smallest value in the range i ^r<j such 
that C{iJ) = Cr{iJ), then 
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In turn, these assertions follow from 

i^q<r => Cq{iJ) + Cr{iJ+ 1) ^ Q(/j + 1) + Cr{i,j) (14.4) 

i<q<r => Cq{i,i) + Cr{i+l,j) ^Cq{i + l,i) + Cr{i,i) (14.5) 

By definition of r we have Cr{iJ) < Cq{iJ) for i ^q<r, so (14.4) and (14.5) give 

0 < Cq{iJ) - Cr{iJ) ^ Cq{iJ+l) - Cr{i,j + 1) 

0 < Cq{iJ) - Cr{i,j) ^ Cq{i + 1 j) - Q(/ + 1 J) 

In turn, (14.4) and (14.5) follow from the quadrangle inequality (14.2), namely 

C(/,;) + C(/',/) ^ C(/,/) + C(/',;) 

Assuming (14.2), we can prove (14.4) by arguing 

Cqihj) + Cr{i,j + 1 ) 

= { definition of Q } 

C{i,q) + C{q+ 1 j') + W{i,j) + C{i,r) + C(r + l,y + 1) + W{i,j+ 1) 

^ { (14.2), as ^ +1 ^ r+1 < 7<7 +1 } 

C{i,q) + C(^+ 1 , 7 + 1) + W(/, 7 + 1) + C(/, r) + C(r + 1 , 7 ) +!+(/, 7 ) 

= { definition of Cq and Ct } 

Cq{iJ + 1) + Cr{i,j) 

The proof of (14.5) is similar. 

It remains to prove (14.2). The proof is by induction on j' — i. The claim is trivially 
true when i = i' orj = 7 ', so (14.2) holds when^' — / ^ 1. For the induction step we 
need to consider the cases i' =j and i' <j separately. 

Case A: i < i' =j< j'. In this case (14.2) reduces to 

C{iJ) + C(jj')i^C{ij') (14.6) 

if i<j < j'. Suppose C{ij') = Cr{i,j'), where i ^ r< j'. There are two subcases, 
depending on whether r <7 or 7 ^ r. If r < 7 , then we reason 

C{i,j) + C(jj') 

^ I since C(i,j) ^ Cr{i,i) for / ^ r < / | 

C(/,r) + C(r+ 1 , 7 ) + W{iJ) + C(jJ’) 

^ { induction (14.6), since^' — r — 1 < 7 ' — / as / < r + 1 } 

C(/,r) + C(r+ 1 , 7 ') + W{i,j) 

^ { assumption; see below } 

C(/,r)+ C(r+ 1 , 7 ')+ 1T(/, 7 ') 

= { definition of r } 

C{if) 

The assumption on W is the case i = i' of the monotonicity condition 
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/' ^ ^ W{iJ) ^ 

The casey ^ r is handled in the same way and requires case j =j' of the monotonicity 
condition on W. 

Case B: i < i' <j<j'. In this case suppose the two terms on the right-hand side of 
(14.2) are minimised at r and s, so 

C{i'J) = Cr{i',j) and C(/,/) = 

where i' ^r< j and i ^ s <j'. Again there are two symmetric subcases. If 5 ^ r, 
then we reason 

c{i,j)+c{i',r) 

^ { definitions of r and s } 

= { definition of Ck } 

C(r» + C{s+lJ) + W{iJ) + C(/',r) + C(r + 1,/) + 

= { induction } 

C(r» + C(5+ 1,/) + W{i,j) + C(/',r) + C(r + Ij) + 

^ { assumption; see below } 

C(/,5) + C(5+ 1,/) + W{i,j') + C{i' ,r) + C{r + \ ,j) + W{i'J) 

= { definition of Ct } 

C,{i,f) + Crii',j) 

= { definition of r and s } 

C(/,/) + C(/'j) 

The assumption is just the quadrangle inequality condition on W. The case r ^ s is 
handled similarly and also requires the quadrangle inequality. This completes the 
proof of (14.1). 


14.5 Optimum binary search trees 

We turn next to a close cousin of optimum bracketing, namely the problem of 
building an optimum binary search tree. One way to build a binary search tree was 
described in Section 4.3, where we showed how to balance a tree so that no search 
takes more than logarithmic time. In practice, however, different keys have different 
probabilities of occurring as the argument of a search. A better organisation would 
be to have keys with a high frequency of occurring closer to the root. For example, 
suppose we wanted to search for all occurrences of the nine-letter words in this 
book, say for the purpose of preparing an index. It turns out that the word ‘algorithm’ 
appears much more frequently than ‘condition’ or ‘operation’, so that key should be 
closer to the root of the tree. 

Suppose we are given probabilities pi,p 2 ,...,Pn, expressed as integer frequency 
counts, so that pj is the probability that the argument of a successful search is the 



348 


Optimum bracketing 


value Xj in a list x\,X 2 , ---jXn of increasing values. Suppose qQ,q\, ...,qnis, another 
list so that qj is the prohahility that the argument of an unsuccessful search falls 
between the two values Xj and Xj+\. By convention, q^ represents the prohahility 
that the search argument is less than xi and qn the prohahility that it is greater than 
Xn- We can install these values in a modified binary search tree in which Null nodes 
are replaced by leaf nodes containing ^-values, and internal nodes are augmented 
with p-values. 

Thus we define a binary search free fo be 

data BST a = Leaf Nat \ Node Nat {BST a) a {BST a) 

For example, ignoring x-values, a simple example is 



The cosf of fhis free is 2^o + 2pi +3^i +3p2 + 3^2 +F 3 + ^ 3 > which is fhe scalar 
product of the result of flattening the tree and the depths of the nodes, where the 
depths of non-leaf nodes are counted from 1 rather than 0 . In general, 

cost:: BST a —)■ Nat 

cost t = sum {zipWith (x) (flatten t) (depths t)) 
where 

flatten ::BST a —)■ [Nat] 
flatten (Leaf q) = [q] 

flatten (Nodep Ixr) = flatten lNr[p] -Vrflatten r 

depths ::BST a —)■ [Nat] 
depths = from 0 

where /rom d (Leaf _) =[d] 

from d (Node _ I _ r) =from (J+l) Z-|-|-[(i+l] -fj-from (d+1) r 

We saw a similar definition of cost in Huffman coding. Moreover, as with the cost 
function in Huffman coding, we can express cost recursively: 

cost (Leaf q) =0 

cost (Node p Ixr) = cost I + cost r + weight (Node p Ixr) 

weight (Leaf q) = q 

weight (Nodeplxr) = p + weight I + weight r 
It follows that the cost C(i,j) of building a binary search tree with frequency counts 
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Pi+u...,Pj and qi,...,qj is given by 

C{iJ) = Min^l; {C{i,k) + C{k + IJ)) + w{i,j) 

where w{i,j) = qt+Pi+i H- +pj + qj. The function w is monotonic and 

satisfies the quadrangle inequality, so (14.1) holds and there is a quadratic-time 
dynamic programming algorithm for constructing a binary search tree with minimum 
cost. 


14.6 The Garsia-Wachs algorithm 

When the frequency counts pj are all zero, so only the costs of unsuccessful searches 
matter, the problem of finding an optimum search tree is essentially the same as 
that of the restricted version of Huffman coding in which the fringe has to be 
exactly the given list. In turn, this is exactly the instance of optimum bracketing 
in which/ = g = (+). For these particular values off and g there is another, quite 
different algorithm for computing a tree with minimum cost. The algorithm is 
known as the Garsia-Wachs algorithm and is fairly easy to describe - at least in 
an unoptimised form - but even the best current proof of its correctness has some 
tricky details, so we will omit it. References to published proofs are given in the 
chapter notes. 

The Garsia-Wachs algorithm is a two-stage process (see Exercise 14.14 as to why 
two stages appear to be necessary). In the first stage we build a tree from the given 
list of weights, and in the second stage we rebuild it. With Weight as a synonym for 
Int, we have 

gwa:: [Weight] —)■ Tree Weight 
gwa ws = rebuild ws {build H'i') 

With Label as another synonym for Int, the types of build and rebuild are 

build :: [Weight] ^ Tree Label 

rebuild:: [Weight] —)■ Tree Label —)■ Tree Weight 

The result of build ws is a tree whose fringe is not ws but some permutation of 
the labels [ \ . .n], where n is the length of ws. The critical property of this tree 
concerns the depths of its leaves. Suppose the depths are d\,d2, ■■■,dn, where dj is 
the depth of Leaf j. Then there is a tree with minimum cost and fringe ws in which 
the depth of the leaf labelled with wj is dj. As an example, suppose build applied to 
[27,16,11,70,21,31,65] produces the tree 
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The list of depths in numerical order of leaf value is [3,4,4,2,3,3,2]. The claim, 
which we will not prove, is that there is a minimum-cost tree for the given input 
whose depths in fringe order constitute exactly this list, and that tree is 



This tree can he obtained from the one above by a simple bottom-up algorithm. 
The starting point is a list of pairs, the first component of each pair being a leaf 
containing the required label wj, and the second component being the depth dj. For 
our example this is the list 

(27,3) (16,4) (11,4) (70,2) (21,3) (31,3) (65,2) 

in which a pair {w,d) represents {Leaf w,d). This list of pairs is reduced to a single 
pair by repeatedly combining the first two adjacent pairs in the list with the same 
depth until only a single pair remains. When two pairs with a common depth are 
combined, the depth is reduced by one. Thus for our example we get the sequence 
of steps pictured in Figure 14.2, ending with a single tree and a final depth of 0. This 
process is not guaranteed to work for all lists of depths, but it does work for those 
that result from the first stage of the Garsia-Wachs algorithm. For example, the tree 
((1 3) 2) has depths [2,1,2] in numerical order of leaf label, but no adjacent pair 
has the same depth, so no pair can be combined, and the reduction process fails to 
make progress. 

The obvious way to implement this reduction process is by a function reduce: 
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(27,3) (16,4) (11,4) 

(70,2) 

(21,3) 

(31,3) 

(65,2) 

(27,3) ((16 11),3) 

(70,2) 

(21,3) 

(31,3) 

(65,2) 

((27 (16 11)),2) 

(70,2) 

(21,3) 

(31,3) 

(65,2) 

(((27 (16 11)) 70),1) 


(21,3) 

(31,3) 

(65,2) 
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((21 31),2) 

(65,2) 

(((27 (16 11)) 70),1) 


(((21 31) 65),1) 



(((((27 (16 11)) 70) ((21 31) 65))),0) 
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Figure 14.2 Combining trees 


reduce :: [[Tree Label,Depth)] —)■ Tree Label 
reduce = extract ■ until single step where 
extract [(?,-)] = t 

step {x:y:xs) = if depth x== depth y then join xy.xs else x: step [y: xs) 
join [t\,d) [ti,-) = [Forkt\ t 2 ,d—l) 

where Depth is a synonym for Int and depth = snd. The function step is applied 
repeatedly until it produces a singleton list. However, this definition of reduce 
can take quadratic time because step can take linear time. The inefficiency arises 
because, if step finds the first pair to be joined at positions k and k + \, then the next 
call of step will repeat the unsuccessful search on the first k — 2 elements when it 
could begin a new search at position k—\, the earliest position at which two depths 
could be the same. One way to avoid the inefficiency is to use afoldl and a recursive 
definition of step, redefining reduce to read 

reduce :: [[Tree Label,Depth)] —)■ Tree Label 
reduce = extract-foldl step [ ] where 
extract [(t, - )] = t 
step[]y =[y] 

step [x : xs) y = if depth x == depth y then step xs [join x y) else y:x:xs 
join [t\,d) [t 2 ,-) = [Forkt\ t 2 ,d—l) 

The first argument to step maintains the invariant that no two adjacent pairs on the 
list have the same depth; this list is kept in reverse order for efficiency. To maintain 
the invariant, step is called recursively whenever two pairs are joined. Each call of 
step takes time proportional to the number of join operations, and there are exactly 
n — 1 of these operations in total, so reduce now takes linear time. 

Having dealt with reduce, we can now define rebuild: 

rebuild :: [Weight] —)■ Tree Label —)■ Tree Weight 
rebuild ws = reduce ■ zip [map Leaf wi) • sortDepths 

The function sortDepths sorts the depths of a tree into increasing order of label 
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value. Since labels take the form [l,2,...,n], where n is the number of nodes in the 
tree, sorting can be accomplished in linear time by using an array: 

sortDepths :: Tree Label —)■ [Depth] 

sort Depths t = elems [array [I, size t) [zip [fringe t) [depths t))) 

Tbe functions size, wbicb counts the number of nodes in a tree, fringe and depths 
can be computed in linear time, so sortDepths and rebuild each take linear time. 

It remains to deal with the first stage of the Garsia-Wachs algorithm, the function 
build. This is where the intricacy of the algorithm resides. The plan of attack is to 
develop a quadratic-time solution first, and then improve it to a linearithmic one by 
a suitable choice of data structure. 

For input [w\,W 2 ,...,Wn\, the starting point is a list 

(0,wo), (l,wi), ( 2 ,W 2 ), ..., [n,w„) 

of pairs of leaves and weights, so (/, w) abbreviates [Leaf j, w). The first pair (0, wq) 
is a sentinel pair in which wq = °o. Use of a sentinel simplifies fhe descripfion of fhe 
algorithm but is not essential (see tbe exercises). Tbe following two steps are now 
repeated until just two pairs remain, the sentinel pair and one other: 

1. Given the current list [(OjWq), [tp,Wp)], where p>l, find the largest^ in the 
range 1 ^j<p such that wy-i +Wj ^ wy + wy+i, equivalently, wj-i ^ wy+i. 
Such ay is guaranteed to exist since wq = Replace the two pairs [tj,Wj) and 
[tj+i,Wj+i) by a single pair (f*,w*) = [Fork tj tj+i,Wj + Wj+i), giving a new list 

( 0 ,Wo), [h,Wi), ..., [tj^i,Wj-i), [h,wfj, (ty+ 2 ,Wy+ 2 ), •••, [tp,Wp) 

2. Now move (f*, w*) to the right over all pairs [t,w) for which w < w*. 

At the end of this process there are just two pairs left, the sentinel and a second pair 
whose first component is the required tree. 

Here is an example. Suppose we begin with the list 

(0,oo), (1,10), (2,25), (3,31), (4,22), (5,13), (6,18), (7,45) 

The first pair to be combined is (5,13) and (6,18) (because 22 ^ 18). The result is 
shifted zero places to the right, giving 

(0,oo), (1,10), (2,25), (3,31), (4,22), ((5 6),31), (7,45) 

The next pair to be combined is (4,22) and ((5 6),31) (because 31 ^ 31). Tbe 
result is shifted one place to the right, giving 

(0,oo), (1,10), (2,25), (3,31), (7,45), ((4 (5 6)),53) 

The next pair to be combined is (1,10) and (2,25), giving 



14.6 The Garsia-Wachs algorithm 


353 


( 0 ,oo), (3,31), ((1 2),35), (7,45), ((4 (5 6 )),53) 

The remaining three steps are similar in that they all involve eombining the second 
two pairs: 

( 0 ,oo), (7,45), ((4 (5 6 )),53), ((3 (1 2)), 66 ) 

( 0 ,oo), ((3 (1 2)), 66 ), ((7 (4 (5 6))),98) 

( 0 ,oo), (((3 (1 2)) (7 (4 (5 6 )))), 164) 

The first component of the second pair is the final tree. Note that the sentinel plays 
a passive role and is never combined with another pair. 

The obvious way to implement this algorithm is repeatedly to scan the whole 
list from right to left at each step, looking for the largesty such that Wj-\ ^ wy+i. 
However, a better way of organising the search stems from the following observation. 
Say that a sequence wi,h' 2 , ... is two-sorted if wi < W 3 < ws < • • • and W 2 < W 4 < 
wg < • • •. It follows from the definition of j in step 1 that the sequence wy,...,uy, is 
two-sorted. Suppose that the following sequence of weights is produced by step 2: 

wo, Wi, W2, ..., Wj-I, Wy+2, •••, Wk-l, W*, Wk, Wp 

Again, both Wk,---,Wp and wy+ 2 , • • •, w^_ 1 , w* are two-sorted because w ^_2 < w*. Fur¬ 
thermore, we know that wj+r < w* for 2 ^r<k —j. That means the next pair 
to be combined is the first one in the following list of three possibilities: 

1 . Wk and Wk+i, provided w* ^ Wk+i ; 

2 . Wj ^2 and wy+ 3 , provided wy-i ^ wy+ 3 ; 

3. Wi and Wi+\, provided 1 ^ /<y — 1 and Wi-\ ^ Wj+i. 

These cases can be captured by expressing build in terms offoldr and a new function 
step: 

build v. [Weight] —)■ Tree Label 

build ws = extract (foldr step [ ] (zip (map Leaf [0.. ]) (infinity : w^))) 
where extract [-,(?,-)] =t 
infinity = sum ws 

No weight arising during the algorithm can be greater than the sum of the input 
weights, so this definition of infinity is adequate. The function foldr step [] scans 
the input from right to left, looking for the next pair to be combined. To define step, 
we first introduce 

type Pair = (Tree Label, Weight) 

weight ■.'.Pair —)■ Weight 
weight (t,w) = w 
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Then step is defined by 

step:: Pair ^ [Pair] —)■ [Pair] 

step X (y:z:xs) | weight x < weight z = x:y :z:xs 

I otherwise = step x {insert {join y z) xs) 

step xxs = x:xs 

join ::Pair —)■ Pair —)■ Pair 
join {ti,w\) {t 2 -,W 2 ) = {Forkti t 2 ,wi +W 2 ) 
insert:: Pair ^ [Pair] —)■ [Pair] 
insert x xs = ys Pr step x zs 
where (y5,z5) = splitListxxs 

splitListxxs = span {Xy.weighty < weight x) xs 

The function insert makes use of an instance splitList of the general utility function 
span to find the right place for a combined pair to be inserted, and calls step again 
to deal with Case 1. The recursive call to step in the definition of step deals with 
Case 2, and Case 3 is handled by the right-to-left search in foldr step []. Note that 
the second argument of both step and insert is always a two-sorted list, a fact we 
will exploit later on. 

In the worst case (see Exercise 14.13), the running time of build is quadratic 
in the length of the input. That means that the algorithm is no better than the 
dynamic programming algorithm seen earlier. The main culprit is the function 
insert, which can take linear time in the worst case. If we could arrange that insert 
took logarithmic time, then the total running time of the Garsia-Wachs algorithm 
would be reduced to 0{n log n) steps. Such an implementation is indeed possible, 
because the second argument to insert is not an arbitrary list of pairs but one that is 
two-sorted on second components. 

The revised implementation is carried out in two stages. The first stage is to 
rewrite build in terms of a new data type List Pair, designed for representing lists of 
pairs that are two-sorted on second components. The following six operations are to 
be provided: 

emptyL ::Lista 

nullL ::List a —)■ Bool 

consL :: a —)■ List a —)■ List a 

deconsL ::List a —)■ {a,List a) 

concatL ::List a —)■ List a —)■ List a 

splitL ::Pair ^ List Pair ^ {List Pair,List Pair) 

Most of these operations are self-explanatory. The first five functions work for lists 
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of any type, but the function splitL is specific to List Pair. This function is the 
analogue of splitList used in the definition of insert. 

The function build is replaced by a new version buildL, basically the same as 
before except that certain list operations are replaced with List operations: 

buildL V. [Weight] —)■ Tree Label 

buildL ws = extractL (foldr stepL emptyL {start w^)) 

where start ws = zip {map Leaf [0..]) {infinity : H'i') 
infinity = sum ws 

extractL :: List Pair —)■ Tree Label 
extractL xs = t 

where (_,y5) =deconsLxs 
((?,_),_) = deconsLys 

StepL :: Pair —)■ List Pair —)■ List Pair 
StepL xxs = if nullL xs V nullL ys V weight x < weight z 
then consL x xs 

else StepL x {insertL (join y z) zs) 
where {y,ys) = deconsLxs 
(z,z>s') = deconsLys 

insertL :: Pair —)■ List Pair —)■ List Pair 
insertL xxs = concatL ys {stepL x 

where (y5,Z5) = splitL xxs 

The second stage is to implement List so that the six operations above take at most 
logarithmic time. Then buildL will take linearithmic time. There are various options, 
and we choose an implementation based on a modification of the balanced binary 
search trees of Section 4.3, henceforth called search trees to distinguish them from 
the leaf trees constructed by the algorithm. To motivate the modification, consider 
the search tree 

10 



whose nodes are labelled with pairs of leaf trees and their weights, although only the 
weights are shown. Flattening this tree produces a list of weights which is two-sorted 
but not sorted. Now suppose we want to insert a new pair with weight w into this 
search tree. We cannot use straightforward binary search because the labels of the 
search tree are not in increasing order of weight. Instead, as well as comparing w 
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with the weight of the leaf tree at the root, we must also compare it with the weight 
of the preceding leaf tree in the list. Only if w is greater than both these weights 
can we continue by searching the right subtree; otherwise we have to search the left 
subtree. In order to avoid repeatedly having to discover the weight of the preceding 
leaf tree, we can install this tree at the root of a search tree. If there is no preceding 
leaf tree, then we can artificially install a copy of the leaf tree. That leads to the tree 

( 12 , 10 ) 



The data type List Pair is now introduced as an instance of 

data List a = Null \ Node Int {List a) (a, a) {List a) 

in which nodes are labelled with pairs of values. As in Section 4.3, the first label of 
a Node records the height of the tree, which is needed in order to maintain balance. 
The implementations of emptyL and nullL are immediate: 

emptyL :: List a 
emptyL = Null 
nullL :: List a —)■ Bool 
nullL Null = True 
nullL _ = False 

The operation consL adds a new pair as a leftmost element of a binary tree: 
consL :: a —7- List a —)■ List a 

consL X Null = node Null (x,x) Null 

consL X {Node _ t\ {y,z) 12 ) = if nullL ti 

then balance {consLxti) (x,z) t 2 
e\se balance {consLxt\) {y,z) t 2 

For an empty tree t, the operation consL x t creates a new node with label {x,x). For 
a nonempty tree t whose left subtree is empty (so y = z), consL x t creates a new 
node with label {x,x) and, since x is now the preceding value of z, assigns {x,z) as 
the new value at the root. Otherwise consL x is applied to the left subtree of t. The 
definition makes use of the two smart constructors, node and balance, described in 
Section 4.3. Recall that node is invoked only when the two trees have heights that 
differ by at most one, and balance only when the two heights differ by at most two. 
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Next, the function deconsL is defined by 
deconsL :: List a —)■ {a,List a) 

deconsL {Node _ t\ (x,y) t 2 ) = if nullL ti then (ydi) 

else {z,balance tj, {x,y) t 2 ) 
where (z, fa) = deconsL t\ 

This searches along the left spine of a tree to find fhe firsf elemenf. 

The nexf function is concatL, which is essentially fhe second version of fhe 
function combine defined in Section 4.3: 

concatL :: List a —)■ List a —)■ List a 
concatL t\ Null = fi 
concatL Null fa = fa 
concatL t\ fa = gbalance t\ {x,y) fa 
where x = lastL fi 

(yds) = deconsL fa 

The subsidiary funcfion lastL refurns fhe lasf value in a nonempfy free: 
lastL:: List a —)■ a 

lastL {Node _ fi (x,y) fa) = if nullL fa then y else lastL fa 

In fhe third clause of concatL the last value in t\ and the first value in fa are combined 
as a new root. The definition of concatL makes use of the general rebalancing 
function gbalance defined in Section 4.3. 

The final function splitL is similar fo fhe function split defined in Section 4.4. 
The difference is fhaf splitL x t has fo splif fhe free f info a pair of frees (fi, fa) in 
which f] consisfs of fhe initial segmenf of f whose weighl componenfs are all less 
fhan weight x, and fa is fhe remaining final segmenf of f. To carry ouf fhis process, 
we splif a free info pieces and fhen sew fhe pieces fogefher again fo make fhe final 
pair of frees. Thus 

splitL::Pair —)■ List Pair —)■ {List Pair,List Pair) 
splitL xt = sew {pieces x t) 

The only difference befween fhis definition of splitL and fhe definition of split in 
Section 4.4 is in fhe definition of pieces: 

data Piece a = LP {List a) {a, a) \ RP {a, a) {List a) 
pieces:: Pair —)■ List Pair —)■ [Piece Pair] 
pieces xt = addPiece t [ ] where 
addPiece Null ps = ps 

addPiece {Node _ t\ {y,z) t 2 ) ps = if weight x > max {weight y) {weight z) 

then addPiece fa {LP t\ {y,z):ps) 
else addPiece t\ {RP {y,z) fa: ps) 
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As we saw in Section 4.4, splitL takes logarithmic time in the size of the tree. This 
completes the definition of the Garsia-Wachs algorithm. 


14.7 Chapter notes 

The standard example of optimum bracketing, presented in most texts on algorithm 
design, is matrix multiplication. In fact, there is an 0{n log n) algorithm for the 
multiplication of n matrices due to Hu and Shing, see [6, 7], though the details 
are quite complicated. For an alternative method of tabulation that uses trees with 
shared subtrees rather than arrays to record partial results, see [1, Chapter 21]. The 
quadratic-time algorithm was first described for the particular case of optimum 
binary search trees in [11]. Our proof of the conditions under which r is mono tonic 
is an extension of the proof given by Yao in [14]. Yao’s paper also considers other 
examples for which the monotonicity and QI conditions hold. 

The amoeba fight show example was first described in [13]. For combinatorial 
applications of the boustrophedon product, see [1, Chapter 28]. 

The Garsia-Wachs algorithm has an interesting history. It was first discussed 
in terms of restricted Huffman coding in [4], where a cubic-time algorithm was 
proposed. As a special case of optimum binary search trees, this was reduced to 
a quadratic-time algorithm in [11]. A different method was described by Hu and 
Tucker in [8]; it is also presented in [5] and in the first edition of [12]. According 
to Knuth, “no simple proof [of the Hu-Tucker algorithm] is known, and it is quite 
possible that no simple proof will ever be found.” Then along came a modification 
of the Hu-Tucker algorithm, the Garsia-Wachs algorithm [3], which was adopted in 
the second edition of [12]. The best proof of its correctness, while still not exactly 
simple, is discussed in [10]; see also [9]. A functional description of the Garsia- 
Wachs algorithm in ML, though one that uses some non-pure functional techniques, 
was given in [2]. All of these articles describe only the quadratic-time version of 
the algorithm. There is a fairly short appendix to [3], written by Robert E. Tarjan, 
that outlines how to implement the 0{n log n) version. Knuth also mentions the 
sub-quadratic algorithm in an exercise [12, Section 6.2.2, Exercise 45], which is 
answered fairly cryptically on page 713. As far as we know, our description of an 
optimal, purely functional implementation of the Garsia-Wachs algorithm is new. 
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Exercises 

Exercise 14.1 Write down the five ways of bracketing Xi (8)Y2 (8*263 (8*264. How 
many ways of bracketing five 26s are there? 

Exercise 14.2 Show that if r(l) = 1 and 

n—i 

T{n)=n+Y,iTik) + T{n-k)) 

k=i 

then T{n) = (3 "-l)/ 2. 

Exercise 14.3 Suppose that the function cost associated with optimum bracketing 
is generalised to read 

cost {Leaf w) =0 

cost {Fork t\ t2) = h {cost ti) {cost t2) +f {size ti) {size t2) 

Under what conditions on h would the cubic-time algorithm in the text still work? 
What might be an appropriate value of h if both subtrees could be computed in 
parallel? 
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Exercise 14.4 Suppose we replace the definition of table (ij) in the quadratic 
algorithm for met hy 

table (ij) 

I i ==j = {i,0, weights ! i,Leaf {weights ! /)) 

I otherwise = minWith cost [fork k {t ! {i,k)) {t\{k+ 1 j)) 

\k^[r{iJ-\)..r{i+\J)]] 

What goes wrong? 

Exercise 14.5 Show that the monotonicity condition 
/' ^ i ^ S{i,j) ^ S{i',f) 

follows from A ^ A* B and B ^ A • B for all sizes A and B. 

Exercise 14.6 Similarly, show that the quadrangle inequality 
i ^ i' ^7 ^ S{iJ) +S{i\f) ^ S{ij') + S{i',i) 

follows from (A • B) + (B • C) ^ (A • B • C) + B. 

Exercise 14.7 Show that the quadrangle inequality fails for (•) = max. 

Exercise 14.8 Verify the monotonicity and quadrangle inequality conditions when 
m»n = m + mxn + n, where m and n are nonnegative. 

Exercise 14.9 When/ / g the conditions that ensure (14.1) are as follows. With 
/ = (o) and g = (•) the first condition is that 

AoB^Ao(B«V) A AoB^(V*A)oB 

This generalises the monotonicity condition. The generalisation of the QI condition 
is more complicated. First of all, say that V is a right-factor of T if T = V or 

Y = Z • V for some Z. Dually, Z is a left-factor of T if T = Z or T = Z • Z for some 

Z. The two conditions are firstly that, if C • D is a right-factor of A • B, then 

AoB + Co(D*Z) ^ Ao{B»X) + CoD 
and secondly that, if C • D is a left-factor of A • B, then 
AoB+(Z«C)oD ^ (Z*A)oB + CoD 

As we said in the text, it is difficult to find useful examples in which these conditions 
are satisfied. Do they hold when mon = m and (•) = (-f)? How about mon = m 
and (•) = (x)? 

Exercise 14.10 Given that the houstrophedon product (-H-) satisfies the two equa¬ 
tions 

{xs -H- [x] -H-yi) (-H-) = {xs (-H-) zs) -H- [x] -H- {ys (-H-) rev xs zs) 

reverse (y^ (-H-) zs) = reverse ys (-H-) rev xs zs 

where the function rev is defined by 
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rev xs zs = if even {length xs) then reverse zs else zs 
prove that (-H-) is associative. 

Exercise 14.11 Suppose in Section 14.4 that r{ij) is defined to be the largest best 
split rather than the smallest. Then (14.3) is proved by showing that Cg{iJ + 1) ^ 
Cr{iJ + 1) for i ^ q < r, and Cq {i + 1 j) ^ Cr{i + 1 ,j) for / + 1 ^q<r. Prove that 
these two facts follow from (14.4) and (14.5). 

Exercise 14.12 Use of a special sentinel in the building phase of the Garsia-Wachs 
algorithm is not essential, provided we change the definition of build to read 

build = endstep -foldr step [\-zip {map Leaf [ 1.. ]) 

Give the definition of endstep. 

Exercise 14.13 Consider the following input to build: 

[k,k,k+l,k+l, ...,1k — \,lk — \] 

How long does build take? 

Exercise 14.14 One may wonder whether the two stages of the Garsia-Wachs 
algorithm are necessary. To show that they are, we can consider two obvious 
simplifications, neither of which contains a labelling stage. One is to follow the 
algorithm for Huffman coding by combining at each step two adjacent trees with 
minimum combined weight. Ties can be broken arbitrarily by choosing the first such 
pair. However, unlike Huffman coding, the combined tree is not moved over other 
trees in order to maintain the same fringe. Here are the steps for input [4,2,4,4,7]: 


(4,4) (2,2) 

(4,4) (4,4) 

(7,7) 

((4 2),6) 

(4,4) (4,4) 

(7,7) 

((4 2),6) 

((4 4),8) 

(7,7) 

(((4 2) (4 4)), 

14) 

(7,7) 

((((4 2) (4 4)) 

7),21) 



The second simplification is to start off with the same input as above, and to follow 
step 1 of the function build, but again not to move the result. Here is the computation 
for the same input as above: 


(4,4) 

(2,2) (4,4) 

(4,4) 

(7,7) 

(4,4) 

((2 4),6) 

(4,4) 

(7,7) 

((4 (2 

4)), 10) 

(4,4) 

(7,7) 

((4 (2 

4)), 10) 

((4 7), 

11) 


(((4 (2 4)) (4 7)),21) 

What are the costs of these two trees? Compute gwa [4,2,4,4,7] and show that it 
has a lower cost. 
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Exercise 14.15 Show that the Garsia-Wachs algorithm does not work when o and 
• are both x. 


Answers 


Answer 14.1 The five ways are 

Xi^{X2^{X3^X4)) 

Xi(E){{X2(E)X3)(E)X4) 

{Xi 0X2) (8) (X3 (8X4) 
(Ai(8)(X2(8)A3))(8)A4 
((Ai (8X2) (8X3) ( 8)^4 

There are 14 ways to braeket five terms. 


Answer 14.2 Rewriting the right-hand side gives us 

n—\ 

r(n)=n + 2 £r(k) 
k=l 

Setting T{n) = (f{n) — l)/2 to get rid of the n on the right, we have 

f{n) -1 


= 1 + I/W 

k=\ 


whieh is solved by taking/(n) = 3”. 

Answer 14.3 The condition x^uAy^v^hxy^huv suffices fo ensure the 
monotonicity condition 

cost ui ^ cost U 2 A cost vi ^ cost V 2 ^ cost {Node mi vi) ^ cost {Node U 2 V 2 ) 

and therefore a recursive definition of met. In a parallel setting we could take h to 
return the maximum of two numbers. 


Answer 14.4 In the case / + 1 =j the result would be 

minWith cost [fork i {t\ {i,i)) (t! (/ + 1, / + 1)), 

fork (/+1) (t!(/+l,/+l)) (t!(/+ 2,/ + !))] 

But t! (/ + 2,/ + 1) is not defined. 

Answer 14.5 If i' = i and j =/ there is nothing to prove. If i' = i and j < j', then 
monotonicity follows from A^A»B, where A = S{iJ) and B = S{j +1 J'). Dually, 
if i' < i and j =/, then monotonicity follows from B ^A»B, where A = S{i', i) 
and B = S{i+ IJ). Finally, if i' < i and j </, then monotonicity follows from 
B ^ A» B • C, where A = S{i' ,i), B = S{i + \ ,j), and C = S{j + IJ'). But this last 
condition follows from the first two. 
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Answer 14.6 If i' = i or j = j' , then the result is immediate. Otherwise, we set 
A = S{i, i'), B = S{i' + \ J), and C = + 1,/). Then the result follows from 

(A*B) + (B*C) ^ {A»B»C)+B 

Answer 14.7 The quadrangle inequality reads 
A max B + B max C ^ A max B max C + B 
But if B < C < A this simplifies to A + C ^ A + B, which is false. 

Answer 14.8 Monotonicity holds because m ^m + mn + n and the quadrangle 
inequality 

a +ab+ b+ b+ bc + c ^ a + a{b + bc + c)+b + bc + c + b 
simplifies fo 0 ^ a (f? + 1) c, which also holds. 

Answer 14.9 In fhe case (•) = (+), monofonicify simplifies fo A ^ A+A, and fhe 
remaining conditions simplify foA + A + C ^A+A + C. Bofh conditions hold for 
nonnegative numbers. In the case (•) = (x), monotonicity simplifies fo A ^ AA, 
which holds, buf fhe remaining conditions simplify to A + AC ^ AA + C, which 
does not hold for all A, A, and C. 

Answer 14.10 The proof of 

{xs (Tf) ys) (Tf) (Tf) (y^ (+I-) 

is by induction on xs. The base case is easy, and for the induction step we can argue 

{{x\xs) (Tf) ys) (Tf) Z5 
= { definition } 

{ys +1- [x] -H- {xs (4f) reverse y^)) (+I-) Z5 
= { first equation, with rs = rev xs zs } 

{ys (Tf) z^) +1- [x] +1- ((x5 (4f) reverse ys) (-H-) rs) 

= { induction } 

(ys (Tf) zi') +1- [x] 4f {xs (4f) {reverse ys (4f) rs)) 

= { second equation } 

{ys (Tf) zs) Tf [x] Tf {xs (Tf) {reverse (y^ (4f) z^'))) 

= { definition } 

(x: x^) (Tf) {ys (Tf) z^) 

Answer 14.11 By definition of r, we have Cq{i,j) ^ Cr{iJ) for i <r,so (14.4) 
gives 

(ij) - Cr(ij) ^ Cq{i,j + 1) - Cr{i,j + 1) 
and (14.5) gives 

0 ^ c^ (iJ) - Cr (/,;■) ^ Cq{i + \ J) - Cr{i + 1 J) 
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Answer 14.12 The definition is 

endstep :: [Pair] —)■ Tree Label 
endstep \{t,-)] =t 

endstep {x:y:xs) = endstep {insert (Join x y) xs) 

Answer 14.13 The input is a two-sorted list, so the first pair is eombined at the first 
step, giving the list 

Again this list is two-sorted. It follows that build takes &{k^) steps in the worst case. 

Answer 14.14 The cost of the first tree is 49, and the cost of the second is 51. We 
have 

gwa [4,2,4,4,7] = ((4 (2 4)) (4 7)) 

with cost 48. 

Answer 14.15 For example, with input [5,10,6,8,7] the Garsia-Wachs algorithm 
produces the tree ((5 10) (7 (6 8))) with cost 17234, while an optimum tree is 
(((5 10) 6) (8 7)) with cost 17206. 



PART SIX 


EXHAUSTIVE SEARCH 




367 


Sometimes there seems to be no better approach than to examine every possible 
candidate in order to find one with a particular property or to show that none exists. 
That, in essence, is exhaustive search. Many of the algorithms we have met so far 
started life as exhaustive search algorithms. By exploiting various monotonicity 
conditions they were then transformed into more efficient alternatives - greedy, 
thinning, or dynamic-programming algorithms, algorithms whose running times 
were typically a low-order polynomial in the size of the input. 

However, for many problems, even quite simply stated ones, no algorithm with 
a guaranteed polynomial running time is known. For example, there is no known 
algorithm for determining the factors of a positive integer that takes polynomial time 
in the number of its digits. There is such an algorithm for determining whether an 
integer is prime or not, but the method is non-constructive and gives no hint of what 
the potential divisors might be. The problems tackled in the remainder of this book 
fall into a similar category of ignorance, and the algorithms we will describe all take 
greater than polynomial time in the worst case. In fact most will take exponential 
time. 

The main problem with an exponential-time algorithm is that it severely limits 
the sizes of problem instances that can be solved. Take, say, an algorithm whose 
running time is 0(2”) in the worst case. If we can improve the algorithm to one with 
a thousand-fold increase in speed, then the size of problem that can be tackled in 
the same allotted time increases from n to n -f 10, while a quadratic-time algorithm 
allows an increase in problem size from n to about 30n. That is a big difference. 

Even when faced with a potentially exhaustive search, there are still a number of 
ways in which to squeeze as much efficiency as possible out of the process. One 
avenue of attack is to arrange the generation of candidates in such a way that the 
transition from one candidate to the next is as fast as possible. It may be possible 
to postpone exploration of less likely paths in favour of those paths that some 
heuristic deems to be more likely to lead to a solution. The choice of representation 
of the candidates can be tuned to minimise the total amount of space required by 
an exhaustive search. Finally, low-level implementations of the basic steps can 
sometimes be found to make them as fast as possible. Functional languages are at a 
disadvantage as far as the last two aspects are concerned because the use of space is 
difficult to control in a purely functional setting, and implementations that depend 
on low-level memory operations often cannot be described without introducing 
procedural features such as mutable arrays into the language. 

Most of the problems we will discuss in the following two chapters deal with 
games and puzzles of various kinds. Apart from being intriguing and fun to study, 
puzzles provide a fertile ground for looking at exhaustive search, if only for the 
reason that a good puzzle should be one in which there is no obvious route to a 
solution. 
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As every good detective knows, an exhaustive search can he organised in different 
ways. The present chapter introduces the two main variations, depth-first search and 
breadth-first search. These different ways of searching are illustrated with the help 
of games and puzzles of various kinds. Good detectives also know how to prioritise 
certain lines of enquiry, in the hope that they will prove to be more fruitful than 
others. One general way of doing so is embodied in heuristic search, a topic we will 
consider in the following chapter. Another way is to formulate possible plans for 
achieving a given goal and then to try each plan in turn until one is successful. One 
example of a planning algorithm is considered in the final section of this chapter. 
We begin, however, with two examples in which the nature of the search is not made 
explicit. 


15.1 Implicit search and the n-queens problem 

Sometimes the set of candidates can be described directly, so let us start out with 
the simple idea of an exhaustive search based on the pattern 

solutions = filter good ■ candidates 

The function candidates generates a list of possible candidates from some given 
data, and the filter operation extracts those that are ‘good’. No particular search 
method is explicit in this formulation because it depends on the precise way in 
which the list of possible candidates is generated. 

The pattern above returns all the good candidates, but to find just one - assuming 
of course that one exists - we can use the idiom 

solution = head ■ solutions 

There is no loss of efficiency in extracting a single solution by this device because, 
under lazy evaluation, only the first element of solutions will ever be computed. 
This simple idea, known as the list of successes technique, was recognised early 
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on as a useful aspect of a lazy functional programming language such as Haskell. 
However, as we will see below, it does not follow that the work for finding any 
one of n possible solutions takes 1 /n of the total time; depending on the precise 
definition of candidates, it may take nearly as much time to find the first solution as 
to find all of them. 

As a first example we will look at a well-known puzzle whose history goes back 
over 150 years. The puzzle is to arrange n queens on an n x n chessboard so that no 
queen attacks any other. Each queen therefore has to be placed on the board in a 
different row, column, and diagonal from any other queen. (Chess players refer to 
ranks SinA files rather than rows and columns, but we will stick with the standard 
matrix nomenclature with rows labelled from top to bottom and columns from left to 
right.) The first two constraints imply that any solution is necessarily a permutation 
of the numbers 1 to n in which the jth element is the number of the column in which 
the queen in rowy is placed. For example, for the 8 -queens problem there are 92 
solutions, of which one is 15863724. The queen in the first row is in column 1, the 
queen in the second row is in column 5, and so on. Each solution is therefore a 
permutation of 1 to n in which no queen attacks any other along a diagonal. Thus, 
for each queen at position {r,q) there can be no other queen at any position {d,q') 
for which r + q = r' + q'or r —q = r' —q'. Each left diagonal (top left to bottom 
right) is identified by coordinates with a common difference and each right diagonal 
(top right to bottom left) by coordinates with a common sum. The diagonal safety 
condition can be implemented by 

safe:: [Nat] —)■ Bool 

safe qs = check [zip [ 1.. ] qs) 

check [ ] = True 

check {{r,q): rqs) = and [abs {q — q') fir' — r\ {r',q') t— rqs] A check rqs 
Now we can simply write 

queens:: Nat —)■ [ [Nat] ] 

queens = filter safe ■ perms 

where perms n generates all permutations of 1 to n. One efficient definition of this 
function was given in the very first chapter: 

perms n =foldr {concatMap ■ inserts) [ [] ] [ 1..n] 
where inserts x [ ] = [ [-^l ] 

insertsx (y.ys) = {x:y:ys): map (y:) {insertsxys) 

With very little effort it seems we have arrived at a reasonable program for solving 
the puzzle. However, this is a very bad way of solving the problem. Generating the 
permutations of 1 to n takes &{n x n\) steps, and each safety test takes &{n^) steps, 
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so the full algorithm takes &{n 2xu!) steps because the safety test has to be applied 
to n \ permutations. 

A better idea is to generate only those permutations that can be extended to safe 
permutations. The idea is to exploit the following property of safe: 

safe {qs +l- [^]) = safe qs A newDiag q qs 
where 

newDiag qqs = and [abs {q — q') r — r' \ z/p [1..] qs] 

where r = length qs + \ 

The test newDiag ensures that the next queen is placed on a fresh diagonal. How¬ 
ever, it is difficult to make use of this property with the above definition of perms 
because new elements are inserted into the middle of previously generated partial 
permutations. Instead we can use another definition of perms: 

perms n = help n where 
help 0 = [ [ ] ] 

help r = [xs-\^ [x] \ xs ■(— help (r — l),x [1. .n],notElemxxs] 

The difference between this definition and the previous one is that each new element 
is added to the end of a previous permutation, not somewhere in the middle. That 
means we can fuse part of filter safe into the generation of permutations to arrive at 

queens I n = help n where 
help 0 = [ [ ] ] 

help r = [qs-\q- [q] \ qs t— help (r— 1),^ t— [1. .n], 

notElem q qs,newDiagi {r,q) ^ 5 ] 

newDiag I {r,q) qs = and [abs {q — q') / r — / | {r',q') ^ zip [I ■■]qs] 

The safety of previously placed queens is guaranteed by construction. The test 
newDiagi takes only 0(n) steps and the resulting search is faster by a factor of n. 

There is a dual solution in which the order of the generators is swapped and new 
elements are added to the front of a previous permutation rather than to the rear: 

queens 2 n = help n where 
help 0 = [ [ ] ] 

help r = [q:qs \ q ^ [I. .n]^qs ^ qss,notElem q qs,newDiag 2 q qs] 
where qss = help (r — 1) 

newDiag 2 qqs = and [abs {q — q') r’ — I ] {r' ,q') ^ zip [2.. ] ^ 5 ] 

The computation of help (r — 1) is brought out in a where clause, for otherwise it 
would be recomputed for each possible placement. A revised version of newDiag is 
needed because queens are now added to the front of a list. 

The two functions queensi and queens 2 generate exactly the same solutions in 
exactly the same order, so which is better? It might be thought that queens 2 should 
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be faster than queens i, if only beeause the operation of adding a queen to the front 
of a list is a constant-time operation, while adding a queen to the rear takes linear 
time. Indeed, the second version is faster when all the solutions are computed. But 
the situation changes dramatically when we want just the first solution. For example, 
computing the first element of queens 2 9 is much slower than computing the first 
element of queensi 9. To see why, consider the first solution 136824975 (out of 352 
possible solutions). This solution is computed from left to right by queens i, and the 
first elements of the partial permutations are generated as follows: 

1, 13, 135, 1352, 13524, 135249, 1357246, 13682497, 136824975 

Most of the work takes place in generating the seventh and eighth partial permutation, 
since 135249 cannot be extended on the right to a solution and neither can 1357246. 
In each case that means more partial permutations have to be generated before 
finding one thaf can be extended. The remaining partial permutations can be easily 
extended and so require far less work. 

Contrast this effort with that required by queens 2 , which computes from right to 
left and generates exactly the same list of partial permutations. This time much more 
work is done at each step in order to find the next partial permutation that begins 
with a 1. For example, the partial permutation 13524 has to be replaced by 35249 in 
order to allow a 1 to be added to tbe front, and this involves generating about 400 
intermediate permutations. Tbe same phenomenon is exhibited at each step of the 
process, causing queens 2 to perform a lot more work than queensi before it returns 
the first solution. Of course, it performs correspondingly less work in computing the 
remaining solutions. The lesson of the story is that the order in which the choices 
for the next move are made can significantly influence the running time for finding 
fhe firsf solution. 

Here is another solution to the n-queens problem, one in which the search strategy 
is made explicit. The general idea, which we will revisit later when we discuss 
depth-first and breadth-first search, is to reformulate the search in terms of two finite 
sets, a set of states and a set of moves, and three functions 

moves :: State —)■ [Move] 
move State ^ Move ^ State 
solved :: State —)■ Bool 

The function moves determines the legal moves that can be made in a given state, 
and move returns tbe state that results when a given move is made. Tbe function 
solved determines wbicb states are a solution to the puzzle. Phrased in this way, 
the problem is essentially one of searching a directed graph in which the vertices 
represent states and the edges represent moves. 

The following algorithm for listing the set of solved states works only under 
certain assumptions, given below: 
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solutions :: State —t [State] 
solutions t = search [t] 

search:: [State] [State] 
search [] = [] 

search (t: ts) = if solved t then t: search ts else search {succs 1 4f ts) 

succs:: State —t [State] 

succs t = [move tm\m<r- moves t] 

In words, if the current state is not a solved state, then its successors are added to 
the front of the states waiting to be explored. This way of dealing with successor 
states is typical of depth-first search. Regarding the assumptions, the major one is 
that the underlying graph is acyclic, for otherwise search would loop indefinitely if 
any state is repeated. The second assumption is that no further moves are possible 
in any solved state, for otherwise some solved states would be missed, and the third 
is that no state can be reached by more than one path, for otherwise some solved 
states would be listed more than once. 

These three assumptions are all satisfied in the n-queens problem, and we can 
immediately install the definitions 

type State = [Nat] 
type Move = Nat 
moves:: State [Move] 

moves qs = [^ | ^ -^ [ 1.. n], notElem q qs, newDiag 2 q ^ 7 ^] 

move:: State —)■ Move —)■ State 
move qs q = q:qs 

solved:: State —)■ Bool 
solved qs = {length qs == n) 

The function newDiag 2 was defined above. A state is solved if it is a full permutation 
of 1.. n, and the set of moves consists of the legal positions at which the next queen 
can be placed. The resulting algorithm is as fast as queens i in finding the first 
solution. 

The definition of search can be modified to count only the number of solutions. 
Counting the number of solutions to the n-queens problem is a time-consuming 
operation. For example, in an experiment in 2006 it took 26613 days of CPU time 
to count the number of solutions to the 25-queens problem, which turns out to be 
2207893435808352. No-one currently knows how many solutions there are when 
n = 28. Nevertheless, we can try to put on speed with the algorithm above by using 
a more compact representation of states. We will describe a representation that 
uses three bit vectors. The three vectors determine which left diagonals, columns, 
and right diagonals cannot be used for the next queen. For example, consider the 
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5-queens problem, and suppose the last two rows have been filled in as follows, 
with row 3 waiting to be filled: 


The three vectors for this state are 11000, 01010, and 00100. The first, 11000, 
determines which left-to-right diagonals are attacked by a queen. We cannot place 
a queen in either column 1 or column 2 because it would be under attack by an 
existing queen along a left-to-right diagonal. The middle vector, 01010, determines 
which columns are attacked, and the third, 00100 determines which right-to-left 
diagonals are under attack. The columns that can be used for the next row are 
calculated by taking the complement of the bitwise union of these three sequences: 

complement (11000 .|. 01010 .|. 00100 ) =00001 
The bitwise union operator . |. and the complement function are taken from the 
Haskell library Data.Bits, as are some further operations described below. The 
result 00001 means we can place a queen only in column 5. 

As another example, consider the possibilities for placing a queen in row 4 when 
row 5 has a queen in column 4. Here the three relevant vectors are 00100, 00010, 
and 00001. We have 

complement (00100 .|. 00010 .|. 00001) = 11000 
so a queen in row 4 can be placed only in columns 1 and 2. Suppose we choose 
column 2 (as in the first example), a choice which is represented by the bit vector 
01000. We can then update the diagonal and column information by 

shiftL (00100 .|. 01000) 1 = 11000 
00010.1.01000 =01010 
shiftR (00001 .|. 01000) 1 = 00100 

These three vectors appeared in the first example. The operation shiftL shifts a bit 
vector a designated number of places to the left, introducing trailing Os. Similarly, 
shiftR shifts a bit vector a designated number of places to the right, introducing 
leading Os. In each of the computations above the shift is by just one place. A state 
is solved when all the bits in the column vector are 1 . 

Haskell provides a number of sizes for bit vectors in the Data.Word library, 
including WordS, Wordl6, Word32, and Word64, each of which is a n-bit unsigned 
integer type for n = 8,16, and so on. We will choose Wordl6, which will allow us 
to solve the n-queens problem for n ^ 16. For n < 16 we can use a mask to mask 
out bits. For example, for n = 5 the mask would he a 16-bit vector all of whose bits 
are 0 except for the last five bits, which are all 1. Numerically, the mask is a bit 
representation of 2 ” — 1 for 0 ^ n ^ 16, so we can define 
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mask :: Wordl6 
mask = 2 ” — 1 

Recomputing mask at every point would affect the efficiency of the search, so we 
make it local to the complete counting algorithm: 

iyipe State = {Wordl6,Wordl6,Wordl6) 
type Move = Wordl6 

cqueens :: Nat —t Integer 
cqueens n = search [ ( 0 , 0 , 0 ) ] where 
search :: [State] —)■ Integer 
search [ ] =0 

search {t : ts) = if solved t then 1 + search ts else search {succs 1 4f ts) 

solved:: State —t Bool 

solved {^,cls, ) = [els == mask) 

mask:: WordI6 
mask = 2 ” - 1 

succs:: State —t [State] 

succs t = [move t b]b ^ moves t] 

move:: State —t Move —t State 

move {lds,cls,rds) m = [shiftL {Ids .]. m) \,cls .]. m,shiftR (rds .[. m) 1) 
moves:: State —)■ [Move] 

moves {Ids,els,rds) = bits {complement {Ids .[. els .[. rds) mask) 

The function bits extracts the hits from a vector as a sequence of hit vectors each 
containing a single set hit: 

bits:: WordI6 —)■ [Move] 
bits V = if V == 0 then [ ] else b: bits (v — b) 
where b = v negate v 

See the exercises for an alternative, slightly less efficient definition. For example, 
bits 11010 = [ 00010 , 01000 , 10000 ] 

The expression v negate v, where is bitwise conjunction, returns the least 
significant bit; for example 

11010 negate 11010= 11010 00110 = 00010 

Repeatedly subtracting the least significant bit from the vector yields all the bits. 
When the counting algorithm was compiled and run, it delivered the fact that the 
16-queens problem has 14772512 solutions in close to a minute of CPU time. 
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15.2 Expressions with a given sum 


Here is a different kind of puzzle that can also be solved using the direct approach. 
The problem involves constructing arithmetic expressions that evaluate to a given 
sum. A simple version of the problem asks for a list of all the ways the operators x 
and + can be inserted into a list of digits 1 to 9 so as to make a total of 100. Two 
such ways are 

100= 12 + 34 + 5 X 6+ 7+ 8+ 9 
100= 1+2x3 + 4 + 5 + 67 + 8 + 9 

In this particular version of the problem, no parentheses are allowed in forming 
expressions and, as usual, x binds more tightly than +. Here we can write 

solutionsNat ^ [Digit] —)■ [Expr] 
solutions n = filter (good n ■ value) ■ expressions 

where expressions builds a list of all arithmetic expressions that can be formed from 
a given list of digits, value delivers the value of such an expression, and good tests 
whether the value is equal to a given target value. 

Let’s consider expressions first. Each expression is the sum of a list of terms, 
each term is the product of a list of factors, and each factor is a nonempty list of 
digits. For example, the expression 

12 + 34 + 5 X6 + 7 + 8 + 9 
can be represented by the compound list 

[[[1,2]], [[3,4]], [[5],[6]], [[7]], [[8]], [[9]]] 

That means we can define expressions, terms, and factors just with the help of 
suitable type synonyms: 

type Expr = [Term] 
type Term = [Eactor] 
type Eactor = [Digit] 
type Digit = Nat 

One simple way to define expressions follows the earlier definition of perms: 

expressions:: [Digit] —)■ [Expr] 
expressions =foldr {concatMap ■ glue) [ [ ] ] 

glue::Digit —)■ Expr [Expr] 
glued[] =[[[[<5?]]]] 

glue d {{ds :fs): ts) = [{{d: ds) :fs): ts, ([d]: ds :fs): ts, [[d]]: {ds :fs): A] 

To explain glue, observe that only one expression can be built from a single digit d, 
namely [ [ [d] ] ]. An expression built from more than one digit can be decomposed 
into a leading factor, ds say, which is part of a leading term ds :fs, and a remaining 
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expression, a list of terms ts. A new digit can be added to the front of an expression 
in exactly three ways: by extending the leading factor with the new digit, by starting 

a new factor, or by starting a new term. For example, 2 x 3 H-can be extended on 

the left with a new digit 1 in one of the following three ways: 

12x3 + --- 

1 X 2 X 3 H- 

l+2x3 + --- 

It is immediate from this definition that there are 6561 = 3^ expressions one can 
build from nine digits, indeed 3”^ ^ expressions from a list of n digits. 

The function value can be implemented as a function valExpr, where 

valExpr v.Expr —)■ Nat 
valExpr = sum ■ map valTerm 
valTerm :: Term —)■ Nat 
valTerm = product ■ map valEact 

valEact wEactor —)■ Nat 

valEact =foldl op 0 where op n d = I0n + d 

Finally, a good expression is one whose value is equal to the target value: 

good "Nat —)■ Nat —)■ Bool 
good nv = {v == n) 

Evaluating solutions 100 [1. .9], and displaying the results in a suitable fashion, 
yields the seven solutions 

100 = 1x2x3+4+ 5 + 6+ 7+ 8x9 
100 = 1+2 + 3+4+ 5 + 6+ 7+ 8x9 
100 = lx2x3x4 + 5 + 6 + 7x8 + 9 
100 = 12 + 3 X 4 + 5+ 6 + 7 X 8 + 9 
100 = 1+2x3+4 + 5 + 67 + 8 + 9 
100 = 1 x2 +34 + 5 + 6x7 + 8 + 9 
100 = 12 + 34 + 5x6 + 7 + 8 + 9 

The computation does not take too long, as there are only 6561 possibilities to check. 
However, on another day the target value may be much larger and there may be 
many more digits, so let us see what we can do to optimise the search. 

One obvious step is to memoise value computations to save recomputing values 
from scratch each time. Better still, we can exploit a monotonicity condition to 
achieve a partial fusion of the filter test into the generation of expressions. The 
situation is exactly the same as with the n-queens problem. The key insight is that 
expressions built out of positive digits, using just juxtaposition, x, and +, have 
values that are as least as large as their constituent expressions. A formal statement is 



378 


Ways of searching 


given as an exercise. So we can pair expressions with their values and only generate 
expressions whose values are at most the target value. 

A technical difficulty is that we cannot determine the values of a new expression, 
obtained by gluing a new digit to the front, from the values of the digit and expression 
alone; we need the values of the leading factor and the leading term as well. So we 
will define fhe component values fo be 

type Values = {Nat,Nat,Nat,Nat) 
values :: Expr —)■ Values 

values {{ds :fs): ts) = (10 " length ds,valFact ds,valTermfs,valExpr ts) 

The additional firsf component of this quadruple is included simply to make the 
evaluation of valEact more efficient. The value of an expression whose component 
values are {p,f,t,e) is/ xt + e. 

Here is the revised definition of solutions: 

solutions::Nat ^ [Digit] —)■ [Expr] 

solutions n = mapfst-filter {good n) ■ expressions n 

The function expressions n generates expressions whose value is at most n: 

expressions::Nat ^ [Digit] —)■ [{Expr, Values)] 
expressions n =foldr {concatMap ■ glue) [ ([ ], ±) ] 

where glue d = filter {ok n) ■ extend d 

extend d ([],_) = [([[[ti]]], 1,0 ))] 

extend d {{ds :fs): ts, {p,f,t,e)) = [{{{d:ds) :fs): ts, (10 x p,p x d +f,t,e)), 

{{[d]:ds:fs):ts, {\0,d,f x t,e)), 

{[[d]] :{ds:fs):ts, {10, d,l,f xt + e))] 

Finally, the tests good and ok are defined by 

good n {ex, {p,f,t,e)) = (f xt + e == n) 
okn {ex,{p,f,t,e)) = {fxt + e^n) 

The resulf is a program for solutions fhaf is many fimes fasfer fhan fhe firsf version. 


15.3 Depth-first and breadth-first search 

In Section 15.1 we implemented a simple version of depth-first search that used 
three functions: 

moves :: State —)■ [Move] 
move :: State ^ Move ^ State 
solved:: State —)■ Bool 

The search, which produced a list of all the solved states, was valid provided three 
assumptions were satished, the main one being that the underlying digraph was 
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acyclic. However, in many applications this assumption does not hold. It is perfectly 
possible that some sequence of moves can lead to a state being repeated, so the 
associated digraph will contain cycles. We will assume, though, that move tm^t 
for all states t and moves m, so the graph does not contain loops. The second 
assumption, namely that final states cannot be arrived at by more than one sequence 
of moves, is not needed if we want to enumerate all the sequences of moves that 
lead to solved states rather than the solved states themselves. The third assumption 
was that no further moves are possible in a solved state, a reasonable restriction we 
will continue to assume. 

Let us therefore consider how to implement a function 
solutions :: State —)• [ [Move] ] 

for computing all simple sequences of moves that lead to a solved state. A sequence 
of moves is simple if no intermediate state is repeated during the moves. Without 
this restriction the set of solutions could be infinite. To maintain the restriction, we 
need to remember both the sequence of moves in a path and the list of intermediate 
states, including the initial state, that arises as a result of making the moves. Hence 
we define 

type Path = {[Move ], [State]) 

where the second component of a path is a nonempty list of states. The simple 
successors of a path are defined by 

succs:: Path —)■ [Path] 

succs {ms, f.ts) = [ {ms -H- [m] ,/ :t:ts) 

I m <r- moves t, let t' = move t m, notElem / a] 

The intermediate states in a path are recorded from right to left. That means a path 
leads to a final state defined by 

final:: Path —)■ State 
final= head■snd 

Next, the function paths takes a list of simple paths and produces all the possible 
completions. Here are two ways to define paths: 

pathsi :: [Path] —)■ [Path] 

paths I = concat ■ takeWhile {not ■ null) ■ iterate {concatMap succs) 
paths 2 :: [Path] —)■ [Path] 

paths 2 ps = concat [p:paths 2 {succs p) | p ■(— ps] 

In pathsi the list of paths is repeatedly extended by applying succs until no more 
extensions are possible. Under this definition, the simple paths are generated in 
ascending order of length. In paths 2 each path is followed immediately by its 
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successors, so paths are not necessarily produced in ascending order of length. We 
will rewrite these two definitions in a moment. Now we can define solutions] by 

solutions]:: State —)■ [[Move]] 

solutions] = mapfst-filter {solved-final) - paths i - start 
The initial state is converted into a singleton containing the empty path: 

start w State —)■ [Path] 
startt= [([],[?])] 

The function pathS] enumerates all simple paths, and the result is filtered for those 
paths that lead to a solved state, which are then processed to produce the moves. 
The definition of solutions2 is the same but with pathS] replaced by paths2- 

The two definitions of paths can be rewritten with the help of a little calculation. 
First consider the expression 

exp =foldrf e - takeWhile p - iterate g 

An easy calculation, left as an exercise, leads to the equivalent recursive definition 
exp X = ifp X then/ x {exp {gx)) else e 
Hence pathS] can be put in the form 

paths] ps = if null ps then [ ] else ps Tf paths] {concatMap succs ps) 

We can now show that 

pathsY {ps Tf qs) = pspi-pathsi {qs Tf concatMap succs ps) 

for all ps and qs. The proof is by induction on ps. The base case is immediate, and 
for the induction step we argue 

paths] {p'.psM-qs) 

= { definition of paths ] } 

p'.psprqsprpaths] {concatMap succs {p'.psM-qs)) 

= { definition of concatMap } 

p'.psM-qsM- paths] {succs p +|- concatMap succs {ps Tf qs)) 

= { introducing ps' =ps-^qs and qs' = succs p } 

p:ps' M- paths] {qs' Tf concatMap succs ps') 

= { induction, expanding the abbreviation } 

p : paths] {ps Tf +|- succsp) 

= { introducing qs" =qs->r\- succs p } 

p: paths ] {psM-qs") 

= { induction again, expanding the abbreviation } 

p'.psM- paths ] {qs +|- succs p +|- concatMap succs ps) 

= { definition of concatMap } 

p'.psM-paths] {qs +|-concatMap succs {p:ps)) 
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This completes the proof. In particular, setting {ps,qs) = ([],p5) we obtain 
paths I (p: ps) = p : paths i {ps -H- succs p) 

That means solutions\ can be rewritten in the form 

solutions\ = search ■ start where 
search [ ] = [ ] 

search {{ms, t:ts):ps) 

I solved t =ms: search ps 
I otherwise = search (ps -H -succs {ms, f.ts)) 

The only assumption is that no moves are possible in solved states. This method is 
known as breadth-first search (BPS). In BPS, the frontier - the list of paths waiting 
to be explored further - is maintained as a queue, with new entries added to the 
rear of the queue. What the above calculation demonstrates is that BPS does indeed 
produce solutions in ascending order of length of path. 

Turning to paths 2 , we can reason 

paths 2 (p : ps) 

= { definition of paths 2 } 

concat [p’ : paths 2 {succsp') \p' ^p'.ps] 

= { definition of concat and paths 2 } 

p : paths 2 {succs p) -H- paths 2 ps 
= { since concat -H-y^i) = concat xss -H- concat yss } 

p : paths 2 {succs p -H- ps) 

Hence we obtain the following alternative definition of solutions^- 

solutions 2 = search ■ start where 
search [ ] = [ ] 

search {{ms, f.ts):ps) 

I solved t =ms: search ps 
I otherwise = search {succs {ms, f.ts)^ps) 

This method is known as depth-first search (DPS). This time, the frontier is managed 
as a stack, with new entries added to the front of the stack. With DPS, the solutions 
are not produced in ascending order of length, though all solutions will still be 
produced. These two definitions of solutions are not quite the usual ways in which 
DPS and BPS are described (see below), but it is instructive that both can be derived 
from clear specifications. 

The point about BPS producing solutions in order of length would seem to tip the 
scales in favour of solutions\. But there is a downside: under BPS, the frontier can 
be exponentially longer than under DPS. Suppose each state has K successors, and 
the first solved state occurs at level n, meaning there is a sequence of n moves that 
leads to a solved state. Under DPS the frontier will increase in size by K at each step. 
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so the final frontier will have length Under BPS, all the successors of all the 
states at a distance of at most n away from the solved state will he queued up in the 
frontier, so the frontier has a length of K'^. Consequently, BPS can use exponentially 
more space than DPS. Worse, as defined above it can also take exponentially longer 
time, because computing (ps +|- succs p) takes time proportional to the length of ps. 

One way to make the algorithm faster, though it will not reduce the space com¬ 
plexity, is to use a dedicated Queue data type to ensure that adding elements to the 
rear is a constant-time operation. Another alternative is to introduce an accumulating 
parameter, defining searchi by 

search\ pssps = search (ps -H- concat (reversepss)) 

Then, after some simple calculation which we will again leave as an exercise, we 
arrive at 

solutions I = search [ ] • start where 
search [ ] [ ] = [ ] 

search pss [ ] = search [ ] (concat (reverse pss)) 
search pss ((ms, t:ts):ps) 

I solved t = ms: search pss ps 
I otherwise = search (succs (ms, t:ts):pss) ps 

In fact, there is another version of search in which the accumulating parameter is a 
list of paths rather than a list of lists of paths: 

search [ ] = if null qs then [ ] else search [ ] qs 

search qs ( (ms, t:ts):ps) 

I solved t =ms: search qs ps 
I otherwise = search (succs (ms, t: A) -H- qs) ps 

This version has a different behaviour from the previous one, in that successive 
frontiers are traversed alternately from left to right and from right to left, but the 
solutions will still be produced in ascending order of length. 

Each of the search functions considered above produces all the solutions. If just 
one solution is required, then there is a further space-saving idea. The problem with 
each of the previous searches is that a list of the intermediate states has to be kept 
as part of each path in order to ensure that each path is a simple one, and this adds 
significantly to the total space required. By moving the membership test to the top 
level, we can guarantee not only that each path is simple, but also that only one path 
to a given state is maintained. 

Here are the details. A path now consists of a sequence of moves and the final 
state that results, so the definition of succs has to be changed to read 

succs (ms, t) = [ (ms -H- [m], move t m)\m-^ moves t] 

Now we can define 
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solution \:: State —)■ Maybe [Move] 
solution\ t = search [ ] [([],?)] 
where search [ ] = Nothing 

search ts {{ms, t) : ps) 

I solved t = Just ms 

I elem t ts = search ts ps 

I otherwise = search {f.ts) {psM-succs {ms,t)) 

The first argument to search is a list of visited states, states whose sueeessors have 
already been added to the frontier. Using a list means that the membership test ean 
take linear time. As an alternative we ean make use of the efficient set operations of 
Section 4.4. The Haskell library Data.Set also provides the necessary operations, so 
we can import it 

import Data.Set {empty, insert, member) 
and define 

solutioni :: State —)■ Maybe [Move] 
solutioni t = search empty [([],?)] 
where search A [ ] = Nothing 

search ts {{ms, t) : ps) 

I solved t = Just ms 
I member t ts = search ts ps 

I otherwise = search {insert t ts) {ps 4+ succs {ms, t)) 

This version of solutioni guarantees that member and insert operations both take 
logarithmic time. This method of searching is what is usually given as the definition 
of BPS. The companion function 

solution 2 :: State —)■ Maybe [Move] 
solution 2 t = search empty [([],?)] 
where search A [ ] = Nothing 

search ts {{ms, t):ps) 

I solved t = Just ms 
I member t ts = search ts ps 

I otherwise = search {insert t ts) {succs {ms, t) 4f ps) 

is what is usually given as the definition of DPS. Neither function is suitable for 
producing all solutions, but they will certainly produce one solution if one exists. 


15.4 Lunar Landing 

Let us now see how DPS and BPS can be put to work in solving another puzzle. This 
one is called Lunar Landing (it is also known as Lunar Lockout) and is an addictive 
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solitaire game invented by Hiroshi Yamamoto and publieised by Nob Yoshigahara, 
the famous Japanese inventor of Rush Hour, a puzzle we will consider later on. 
Although it can be played on boards of different shapes and sizes, the standard is a 
5x5 square of cells, of which the centre cell is designated as an escape hatch. On 
the board is a human astronaut and a number of hots, each occupying a single cell. 
The aim of the game is to get the astronaut safely into the escape hatch. Both the 
astronaut and the hots can move only horizontally or vertically. The catch is that 
beyond the boundary of the board lies infinite space, and no hot or human ever wants 
to go there. Consequently, each move involves moving a piece as far as possible in 
a straight line until it comes to rest next to another piece which is blocking the path 
into infinite space. The aim is to find a sequence of moves fhaf enables fhe asfronauf 
fo land exacfly on the escape hatch. 

Here is an example board, in which the astronaut is piece number 0, there are five 
hots numbered from 1 to 5, and the escape hatch is marked with a x: 



In this position only hots 3 and 5 can move; the astronaut and the remaining hots 
would shoot off into infinite space if moved. Bot 3 can move downwards one cell 
and bot 5 upwards one cell. The longest sequence of moves involving bot 3 alone 
is 3D 3R3U 3R 3D. In words, bot 3 can move down, right, up, right, and down 
until it ends up just above bot 4. Bot 5, on the other hand, can engage in an infinite 
sequence of moves. The two sequences of moves 

5U5R 

5U5R5U5R 5D 5L 5D5R 

both result in the same final posifion in which bof 5 ends up fo fhe lefl of bof 4. The 
last six moves of the second sequence can be repeated ad infinitum. Nevertheless, 
there is a unique nine-move solution to the puzzle. Pause for a moment to see if you 
can find if. 

The answer is the nine moves 
5U5R5U2L2D2L0U0R0U 

Bot 5 in three moves ends up below bot 1, then bot 2 in three moves ends up to the 
right of bot 3, and finally the astronaut can escape in three more moves. 
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There is another solution involving 12 moves but only two pieees: 
5U5R5U5R5D5L OU OR OU OR OD OL 

Notice that in this solution the astronaut passes over the escape hatch during her 
third move, but only lands on it at the final move. There are example boards for 
which there are an infinite number of solutions, so the associated digraph is cyclic. 

The first decision concerns how to represent the board. The obvious method is to 
use Cartesian coordinates, but a more compact representation is to number the cells 
as follows: 


1 

2 

3 

4 

5 

7 

8 

9 

10 

11 

13 

14 

15 

16 

17 

19 

20 

21 

22 

23 

25 

26 

27 

28 

29 


Cells that are a multiple of 6 represent the left and right borders, which will help in 
determining the moves. The escape hatch is cell 15. A board is a list of occupied 
cells with the first cell, at position 0, naming the location of the astronaut. For 
example, the board above is represented by the list [26,3,11,13,22,25]. Hence we 
define 

type Cell = Nat 
type Board = [Cell] 

solved ■/.Board —)■ Bool 
solved b = (fi !! 0 == 15) 

The next decision is how to represent moves. Rather than take a move to be a 
named piece and a direction, we will represent a move by a named piece, its current 
position, and the finishing point of the move: 

type Name = Nat 

type Move = {Name, Cell, Cell) 

A move can be recast in terms of directions by showMove, where 

showMove :: Move —)■ String 
showMove {n,s,f ) = show n-O-dir {s,f ) 

dir {s,f) = if abs {s —f) ^ 6 then (if 5</ then "D" else "U") 

else (if 5 </ then " R" else " L") 

The function move is defined by 

move '..Board —)■ Move —)■ Board 

move b {n,s,f ) = b\ Tf /: ^2 where (fii, _ : ^ 2 ) = splitAt n b 
It remains only to define moves, which takes the form 
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moves V. Board —)■ [Move] 

moves b = [{n,sj) \ {n,s) ^ zip [0..] b,f ^ targets b 5 ] 

The function targets, which determines the destination cells of moves, is defined in 
terms of the four possible paths for moving a piece: 

targets V. Board —)■ Cell —)■ [Cell] 

targets b c = concatMap try [ups c, downs c, lefts c, rights c] 
where try cs ] null = [ ] 

I null xs = [ ] 

I otherwise = [last xs] 
where {xs,ys) = span b) cs 

ups c = [c — 6,c — 12.. 1] 

downs c = [c + 6,c + 12. .29] 

lefts c = [c—l,c — 2. .c —c mod 6 +1] 

rights c =[c+l,c + 2..c — c mod 6 + 5 ] 

Each of the various directions is examined in turn to see if there is a blocking piece 
along the path. If there is, the cell adjacent to the blocker is a possible target for a 
move. Putting these functions together, we can compute all the simple solutions for 
a given board by 

safeLandings = map [map showMove) ■ solutions 

where solutions is, say, the breadth-first version defined in the previous section. 
When solutions was run on the example board, it produced 25 simple solutions, of 
which the first two were those described above. 


15.5 Forward planning 

With both DFS and BPS we have basically the strategy of trying sequences of 
random moves until finding one that works. For some games and puzzles, indeed 
for some real-life problems, it is possible to improve on random search by suitable 
forward planning. The subject of planning algorithms is a broad one, and we will 
consider only one very simple situation. Suppose it is known that a certain sequence 
ms of moves is sufficient to take the starting state into a goal state. Such a sequence 
of moves constitutes the game plan. Now, it may or may not be the case that the first 
move m in ms is a valid move in the starting state. If it is, then move m is made and 
the algorithm carries on with the rest of the plan. If not, then it may be possible to 
find one or more lists of preparatory moves, each of which - provided they can be 
carried out - leads to a state in which move m is a legal move. After making these 
preparatory moves, the move m can then be made, in which case the rest of the plan 
is carried out as before. However, some of these preparatory moves may, in turn. 
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require further preparatory moves to be made, so the planning process may have to 
be repeated. In the case that no preparatory moves can be found for a given move 
to be valid, a random move is made instead. It is because of this last possibility 
that a planning algorithm should be thought of as an extension of, rather than an 
alternative to, depth-first or breadth-first search. 

Here is an example. Suppose you wanted to move a grand piano to an upstairs 
room. One sensible game plan is first to move the piano into the hallway, then to 
lift the piano up the stairs, and finally fo move fhe piano info fhe required room. 
The firsf step may nol be possible because (i) fhe pafhway fo fhe door is blocked 
by a chair, and (ii) fhe piano will nol go Ihrough fhe door wilhoul removing ils 
legs. In such a case fhe preparatory moves would consisl of, in eifher order, moving 
fhe chair and removing fhe legs of fhe piano. The firsl lask, say moving fhe chair, 
mighl be possible, buf fhe second lask would firsl involve oblaining a suilably large 
screwdriver for unscrewing fhe legs. Once fhe lask of moving fhe piano into fhe 
hallway is accomplished, fhe nexl step, lifting fhe piano up fhe slairs, may nol be 
possible wilhoul calling on Ihe help of a number of friends to assisl in Ihe lifting. 
And so on. 

Here are Ihe delails. Abslraclly, a plan is a sequence of moves: 
type Plan = [Move] 

The game plan is provided by a funclion 
gameplan :: State —)■ Plan 

The problem is solved in a given sterling slate by making Ihe moves in gameplan. 
An emply plan means success. Olherwise, if Ihe firsl move in Ihe currenl plan can 
be carried oul, Ihen Ihe move is made and Ihe plan proceeds wilh Ihe remaining 
moves. If if cannol, Ihen we make use of a funclion 

premoves :: State —)■ Move —)■ [Plan] 

for formulating additional plans. Given a slate and a move, each alternative plan in 
premoves should enable Ihe move to be made, provided Ihe moves in Ihe plan are 
executed first The firsl move in each plan relumed by premoves may in lum require 
further preparatory moves to be made, so we have to form new plans by iterating 
premoves'. 

newplans:: State —)■ Plan —)■ [Plan] 
newplans t [ ] = [ ] 

newplans t (m: ms) = if elem m [moves t) then [m : ms] else 
concat [ newplans t [pms -H- m: ms) 

I pms ^ premoves t m,all ms) pms] 

The result of newplans is a possibly empty list of nonempty but finite plans, the first 
move of which can be made in a given state. Plans cannot contain repeated moves. 
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If, in order to make a certain move, a plan requires that move to be made first, then 
clearly the plan is cyclic and cannot be implemented. 

Using just the two new functions newplans and gameplan, we can now formulate 
a search based on an extended type of path and frontier: 

type Path = {[Move],State,Plan) 
type Frontier = [Path] 

This time, a path consists of the moves already made, the current state, and a plan 
for the remaining moves. We can define the planning algorithm to have the same 
structure as the time-efficient version of breadth-first search considered above: 

psolve:: State —)■ Maybe [Move] 

psolve t = psearch [ ] [ ] [ ([ ], f, gameplan t) ] where 

psearchw [State] Frontier —)■ Frontier —)■ Maybe [Move] 

psearch [ ] [ ] = Nothing 
psearch [ ] = psearch 
psearch ts qs {{ms, t,plan) : ps) 

I solved t = Just ms 
I elem t ts = psearch ts qs ps 

I otherwise = psearch {t : ts) {bsuccs {ms, t,plan) -H- qs) 

{asuccs {ms, t,plan) -Mps) 

In psearch, all plans in the main frontier are tried first in a depth-first manner until 
one of them succeeds or all fail. The function asuccs is defined by 

asuccs:: Path [Path] 

asuccs {ms, t,plan) = [{ms -H- [m ],move t m,p) | m :p newplans tplan] 

In particular, if elem m {moves t), then 

asuccs {ms, t, m:plan) = [ {ms -H- [m], move t m,plan) ] 

If all plans fail, we can make some legal move at random and start again with a new 
game plan. The function bsuccs is defined by 

bsuccs:: Path [Path] 

bsuccs {ms,t,-) = [ {msFr [m],t',gameplan t’) 

I m ^ moves t, let t' = move t m] 

Such additional plans are necessary for completeness: plans may fail even though 
there is a solution. This is a consequence of the fact that plans are executed greedily 
and moves that can be made are made. Note that, if newplans returns the empty list, 
so does asuccs. In such a case, psolve reduces to simple breadth-first search. 
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Figure 15.1 A simple Rush Hour grid 


15.6 Rush Hour 

Let us now see how forward planning can help with another sliding-hlock puzzle. 
This one is called Rush Hour, and is played on a 6 x 6 grid. Covering some of the 
cells of the grid are cars and trucks, which are placed either horizontally or vertically. 
Cars occupy two cells, while trucks occupy three. Horizontal vehicles can move left 
or right and vertical vehicles up or down, provided their path is not obstructed hy 
another vehicle. One fixed cell, three places down along the right-hand side of the 
grid, is special, and is called the exit cell. One vehicle is also special. It is horizontal 
and occupies cells to the left of the exit cell. The object of the game is simply to 
move the special car to the exit cell. 

A very simple starting grid, reminiscent of a real car-park situation, is pictured 
in Figure 15.1. Down the middle of the grid is a line of cars, the fourth of which 
has moved one place forwards. The special car, the third one, cannot exit the car 
park because its path is impeded by a vertical tmck. To enable the special car to exit, 
the truck has to move two places down (which counts as two moves), which in turn 
requires the fourth car to move back into line (one move). The puzzle therefore has a 
fairly obvious five-move solufion (fhe special car lakes Iwo moves lo gel lo Ihe exil). 
There are nine possible slarling moves on fhe grid - fhe firsl car can move one step 
lefl or righl, fhe second car one slep lefl, and so on - and breadlh-firsl search could 
involve examining aboul 9^ moves before finding fhe shorlesl five-move solution. 
Simple planning, on fhe olher hand, leads al once lo fhe answer. Of course, mosl 
slarling grids lhal come wilh Ihe puzzle are considerably more difficult: there are 
starting grids that take 93 moves to solve! Furthermore, as we will see, planning is 
not guaranteed to find a shortest solution. 
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There are various ways to represent a grid, but we will take essentially the same 
approach as in Lunar Landing and number the cells as follows: 
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The left and right borders are cells divisible by 7; the top border consists of cells 
with negative numbers and the bottom border consists of cells with numbers greater 
than 42. The exit cell is cell 20. A grid state can be defined as a list of pairs of 
cells, with each pair (r,/) satisfying r </ and representing the rear and front cells 
occupied by a single vehicle. The vehicles in the grid are named implicitly by their 
positions in the list, with the special vehicle being vehicle 0, so the first pair of 
numbers in the grid represents the cells occupied by vehicle 0, the second pair 
vehicle 1, and so on. For example, the grid of Figure 15.1 is represented by 

[(17,18), (3,4), (10,11), (12,26), (32,33), (38,39)] 

This representation is captured by introducing the type synonyms 

type Cell = Nat 
type Vehicle = {Cell, Cell) 
type Grid = [Vehicle] 

The list of occupied cells in the grid can be constructed in increasing order by filling 
in the intervals associated with each vehicle and merging the results: 

occupied :: Grid —)■ [Cell] 
occupied =foldr merge [] - map fill 

fill:: Vehicle —)■ [Cell] 

fill {r,f) = if horizontal {r,f) then [r. ./] else [r,r + l ../] 

horizontal:: Vehicle —)■ Bool 
horizontal {r,f ) =/ — r < 6 

The next decision concerns the representation of moves. A simple representation is 
to say that a move consists of a vehicle’s name and the target cell: 

type Name = Nat 
type Move = {Name, Cell) 

For example, if a car occupies the cells (24,25) then the possible target cells are 23 
and 26. The valid moves are defined by 

moves'.: Grid —)• [Move] 

moves g =[{n,c) ] {n,v) ^ zip [0..] g,c ^ steps v,notElem c {occupied g)] 
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The function steps is defined by 

steps (r,/) = if horizontal {rj) 

then [c\c ^\f + I,r—\],c mod 7/0] 
else [c I c ^ |/ + 7,r — 7],0<c A c <42] 

Each step involves moving a vehicle a step in one of two directions, left or right for 
horizontal vehicles, and up or down for vertical vehicles. In each case the target of 
the move has to be an unoccupied cell. 

The function move is implemented by 

move :: Grid —)■ Move —)■ Grid 
move g {n,c) = gi Tf [adjust v c] +|- g 2 

where (gi, v: § 2 ) = splitAt n g 

adjust :: Vehicle —)■ Cell —s- Vehicle 

adjust {r,f ) c = iff < c then (c —/ + r, c) else (c, c +/ — r) 

Finally, a puzzle is solved if the front of the special car is at the exit cell: 

solved:: Grid —)■ Bool 
solved g = snd {head g) ==20 

Having defined moves, move, and solved, one can now implement a breadth-first or 
a depth-first search following the standard recipe. 

Turning to psolve, it seems we need only define gameplan and premoves. How¬ 
ever, the definition of newplans given in the previous section needs to be modified 
in order to work with Rush Hour. To see why, suppose the first move in the current 
plan is (0,19), moving the special car one step right from its initial position (17,18). 
Assume further that cell 19 is currently blocked by a vehicle, so there is a need 
for preparatory moves that move this vehicle out of the way. Now it is perfectly 
possible that one of these preparatory moves is (0,16), moving the special vehicle 
one step left. After executing these moves, we see that (0,19) is no longer a valid 
move because it requires car 0 to move two steps forward. 

To solve this problem, we will allow multi-step moves in plans, but expand them 
to single-step moves before computing new plans. Thus we redefine newplans to 
read 

newplans :: Grid —)■ Plan —)■ [Plan] 
newplans g [ ] = [ ] 

newplans g (m: ms) = mkplans {expand gm-W ms) 

where mkplans {m: ms) = if elem m {moves g) then [m: ms] else 

concat [ newplans g {pms -H- m: ms) 

I pms ■(— premoves g m, all {f. ms) pms] 

Each move is expanded into a sequence of legal moves before new plans are made. 
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Figure 15.2 Six Rush Hour problems 


Furthermore, premoves now returns a list of possibly multi-step moves rather than a 
list of sequences of moves. The function expand is defined by 

expand :: Grid —)■ Move —)■ [Move] 
expand g {n,c) = if horizontal {r,f) 

then iff < c then [{n,d) \ d <— \f + \ . c]] 

else [(n,d) I d t— [r—l,r — 2. .c]] 
else if/ <c then [(n,d) | dt— |/ + 7,/ + 14. .c]] 
else [{n,d) \ d ^ [r — 1 ,r—n. .c]] 

where (r,/) = gV.n 

Given the ability to make use of multi-step moves, we can define gameplan by 

gameplan:: Grid —)■ Plan 
gameplan g = [ (0,20) ] 

To define premoves, observe fhaf, if a move cannof be made if is because fhe fargef 
cell is blocked by a vehicle, which fherefore has fo be moved ouf of fhe way. Each 
addifional plan fherefore consisfs of a single, possibly mulfi-sfep move: 

premoves:: Grid —)■ Move —s- [Plan] 

premoves g (n,c) = [[m] [ m ^freeingmoves c {blocker g c)] 

blocker:: Grid —)■ Cell —)■ {Name, Vehicle) 

blocker g c = head [{n,v) [ {n,v) ^ zip [0..] g,elem c {fill v)] 

The funcfion blocker refurns fhe name of fhe blocking vehicle and fhe cells occupied 
by ifs fronf and rear. To deftne. freeingmoves, observe fhaf, if a vehicle wifh lengfh 
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Puzzle 

bfsolve 

moves 

psolve 

moves 

dfsolve 

moves 

(1) 

0.80s 

34 

0.08s 

38 

0.42s 

1228 

(2) 

0.44s 

18 

0.03s 

27 

0.42s 

2126 

(3) 

0.20s 

55 

0.12s 

57 

0.11s 

812 

(4) 

16.83s 

93 

0.28s 

121 

17.27s 

15542 

(5) 

4.14s 

83 

1.06s 

119 

3.47s 

4794 

(6) 

0.78s 

83 

0.08s 

89 

0.27s 

1323 


Figure 15.3 Running times and move counts for the six Rush Hour problems 


k is horizontal, then in order to free cell c we have to move the vehicle either 
rightwards to cell c + ^ or leftwards to cell c — k.\f the vehicle is vertical, then the 
move is downwards to cell c + 7 ^ or upwards to c — 7 In each case the destination 
cell has to he on the grid. For a horizontal vehicle (r,/) we have k =f — r +1, while 
for a vertical vehicle k= (/ — r)/7 + l. Hence we can define 

freeingmoves :: Cell —)■ {Name, Vehicle) —)■ [Move] 
freeingmoves c {n, {r,f)) = 
if horizontal {r,f ) 

then [{n,j) \j^[c-(f-r+l),c + {f-r+l)],a<j^j<b\ 
else [(«,;■) \ j<r- [c-{f-r + l),c + {f-r + l)],0<j ^j<Al\ 
where a = r — r mod l-,b =f —f mod 7+7 

This completes the planning algorithm for Rush Hour. 

So, is psolve better than a BFS or a DFS solution, and if so, by how much? 
Pictured in Figure 15.2 are six Rush Hour grids, the bottom three of which are 
amongst the hardest known starting grids. Each puzzle was tackled using bfsolve 
(a BFS to find one solution), psolve, and also dfsolve (a DFS to find one solution). 
The computations were run using GHCi, with the results given in Figure 15.3. In 
each case, psolve is faster than bfsolve, varying from a factor of two to a factor of 
60 in the case of puzzle (4). On the other hand, in no case did psolve find solutions 
with the minimum number of moves. As can be seen from the table, dfsolve found 
solutions with many more moves than necessary. 


15.7 Chapter notes 

The 8-queens puzzle was first described in 1848; many mathematicians, including 
Gauss, have worked on the problem. See [10], and also [4] which contains an explicit 
formula for computing a single solution for the n-queens problem for all n ^ 4. 
Using a massively parallel approach, the value of 2(27), the number of solutions 
for the 27-queens problem, was found to be 234907967154122528 in September 
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2016; see [7]. Other values of Q{n) appear as sequence A000170 in the On-line 
Encyclopedia of Integer Sequences (OEIS), see [11]. The hit-vector approach was 
first described hy Qiu Zongyan [12] and later rediscovered hy Martin Richards [9]. 

The problem of computing expressions with a given sum appears in [1, Chapter 6]. 
Knuth [6, Section 7.2.1.6, Exercise 122] also discusses the problem, and gives 
other variants of the problem, such as allowing parentheses and further arithmetic 
operations. 

Eunar Eanding is available at www.thinkfun.com/products, as is Rush Hour. 
Lunar Landing is also known as Lunar Lockout and the UFO puzzle- A computer 
analysis of the puzzle on boards of different shapes and sizes can be found in [8]. 
The planning algorithm for Rush Hour was first described in [1]. The complexity of 
the problem is discussed in [3] and the hardest known starting grid is taken from 

[2]. Eor more information on planning algorithms, consult [5]. 
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Exercises 

Exercise 15.1 What simple optimisation would make queens^ run faster? 

Exercise 15.2 Consider the solution for the 4-queens problem based on the funetion 
search. Write down the successive arguments of search up to the point that the first 
solution is found. 

Exercise 15.3 In the solution to the n-queens problem based on bit vectors, there is 
another, non-recursive definition of bits which uses the Data.Bits function bit. The 
value of bit i is a bit vector with bit i set to 1 and all other bits set to 0. Give the 
alternative definition as a list comprehension. 

Exercise 15.4 In Section 15.2 the definition of solutions 100 [1 • .9] used limited- 
precision integers. Why is this justified? 

Exercise 15.5 Consider again the two functions solutions and solutions\ for com¬ 
puting the number of ways a list of digits can be combined to give a target value. 
Recall that the latter is an optimised version of the former based on the monotonicity 
of expressions built out of x and -|-. What would you expect the value of 

solutions 100 [0..9] == solutionsi 100 [0..9] 
to be? 

Exercise 15.6 Express the condition that expressions built out of juxtaposition, x, 
and -|- never decrease the value of an expression more formally as a property of 
glue. Does the condition hold when the digit 0 is allowed in expressions? 

Exercise 15.7 If we allowed decimal points in expressions, then there are other 
ways of making 100, including 

100 = 1 X .2 -h .3 -h45 -h 6.7 X 8 -h .9 
100 = 1 x23 x4-h5.6-h.7-h.8-h.9 
100 = 1 x23x4-h5-h.6-h.7-h.8-h.9 

In Haskell, .6 isn’t a legal expression and one has to write 0.6 instead, but never 
mind. Are there any other ways? Write a program to find out. As a hint, there 

are seven ways to extend 2 x 3 -|-on the left with a new digit 1, six ways to 

extend .2x3-1 -, and five ways fo extend 2.3 x 4 -|-. Base the program on the 

following type synonyms: 

typeEvpr =[Term\ 

type Term = [Factor] 

type Factor = {[Digit], [Digit]) 

A factor {xs,ys) contains the digits xs before the decimal point and the digits ys 
after the point. Either xs or ys can be the empty list but not both. 
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Exercise 15.8 Supposing we allowed exponentials in expressions, there is at least 
one other way to make 100: 

100=1+2^3 + 4x5 + 6 + 7x8 + 9 

Are there any other ways? Write a program to find out. Again, no parentheses are 
allowed. Base the program on the following type synonyms: 

typeExpr =[Term] 
type Term = [Expo] 
type Expo = [Eactor] 
type Eactor = [Digit] 
type Digit = Integer 

Here a digit is an Integer and the values of expressions are Integer values because 
the numbers involved can exceed the range of fixed-precision arithmetic. Each term 
is now a product of a nonempty list of exponentials; for example 

12^3x4^5 + 6x7 
would be represented by 

[[[[1,2],[3]], [[4],[5]]], [[[6]],[[7]]]] 

Assume that exponentiation associates to the left, so that, for example, 2 " 3 " 2 = 64. 
(In Haskell, exponentiation associates to the right, but solving the problem with 
this order of association would involve such huge numbers that the program would 
crash.) 

Exercise 15.9 Given that 

exp =foldrf e ■ takeWhile p ■ iterate g 
show that 

exp X = if p X then/ x [exp {g x)) else e 

Exercise 15.10 Recall that in order to optimise BPS we defined 
searchi pssps = search (ps +- concat {reversepss)) 

Calculate searchi pss (p : ps), assuming the first state in p is not a solved state. 

Exercise 15.11 Given is a permutation of [0.. n]. The aim is to sort the permutation 
into increasing order using only moves that interchange 0 with any neighbour that 
is at most two positions away. For example, [3,0,4,1,2] can produce [0,3,4,1,2], 
[3,4,0,1,2], and [3,1,4,0,2] in a single step. Can the aim always be achieved? 
Write a breadth-first search to find the shortest sequence of moves. {Hint: one 
possible representation of states and moves is 

iytpe State = {Nat, Array Nat Nat) 
type Move = Nat 
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Figure 15.4 A Rush Hour grid 


The first component of a state is the location of 0 in the array, and a move is an 
integer giving the target position for 0.) 

Exercise 15.12 Imagine a row of differently sized jugs, given in ascending order of 
their capacity. Initially all jugs are empty except for the last, which is full to the hrim 
with water. The object is to get to a situation in which one or more jugs contains 
exactly a given target amount of water. A move in the puzzle consists of filling one 
jug with water from another jug, or emptying one jug into another. (Water cannot 
simply he discarded.) Suppose cap is a given array that determines the capacity of 
each jug, and target is a given integer target. Decide on the representation of states 
and moves and give the functions moves, move, and solved. Hence use hreadth-first 
search to find the unique shortest solution for the particular instance of three jugs, 
with capacities 3, 5, and 8, and a target amount of 4. 

Exercise 15.13 There are m elves and m dwarves on a river hank. There is also a 
rowing boat that can take them to the other side of the river. All elves can row, but 
only n of the dwarves can. The boat can contain up to p passengers, one of whom 
has to be a rower. The problem is to transport the elves and dwarves safely to the 
other side in the shortest number of trips, where a trip is safe if the dwarves on 
either side of the river, or in the boat, never outnumber the elves. For the avoidance 
of doubt, the boat empties completely before new passengers get on. The exercise 
simply asks for a suitable way to model states and for the definitions of moves, 
move, and solved. 

Exercise 15.14 Write a function showMoves:: [Move] —)■ String for Lunar Landing 
so that, for example, the moves 5U 5R 5U 2L 2D 2L OU OR OU are recorded as 
5URU 2LDL OURU. The Haskell Data.List function groupBy may prove useful. 
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Exercise 15.15 Consider the Rush Hour grid g in Figure 15.4. Using directional 
rather than cell-hased notation, the value of gameplan for this grid is ORRR, meaning 
the special car 0 has to move three places to the right. By listing the appropriate 
values of premoves, determine the value of newplans g gameplan. 


Answers 

Answer 15.1 Just reverse the lists in help and use newDiag 2 . 

queens n = map reverse {help n) 
where help 0 = [ [ ] ] 

help r = \q\qs\qs ^ help (r—l),^t— [I ..n], 
notElem q qs, newDiag 2 q qs] 

Answer 15.2 The successive arguments of search are 

[[]] 

[[1],[2],[3],[4]] 

[[3,1],[4,1],[2],[3],[4]] 

[[4,1],[2],[3],[4]] 

[[2,4,1],[2],[3],[4]] 

[[2],[3],[4]] 

[[4,2],[3],[4]] 

[[1,4,2],[3],[4]] 

[[3,1,4,2],[3],[4]] 

Answer 15.3 One non-recursive definition of bits is 

bits :: Wordl6 [Wordl6] 

bits V = [b\ b map bit [0. .15],v b == b] 

Answer 15.4 The largest value of an expression is 123456789, which is less than 
2^^, so /nt-arithmetic is adequate. Of course, when the input is longer than nine 
digits, we should use Integer arithmetic. 

Answer 15.5 You would prohahly expect the answer to he True, hut you would he 
wrong. The left-hand side returns 17 solutions, while the right-hand side returns 
only 14. One answer returned hy solutions hut not hy solutions\ is 

100 = 0 X 1-h2 X 3-f 4-h5-f 6-f 7-h8 X 9 

The reason for the discrepancy is that the monotonicity condition fails when the 
digit 0 is allowed in expressions. In particular, 

101 = l-f2x3-f4-f5-f6-f7-f8x9 
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but when this expression is extended by the digit 0, the result ean drop in value. 
Answer 15.6 The monotonieity eondition ean be expressed by 
elem y {glue dx) ^ value x ^ value y 

The proof follows from the fact that, if x and y are positive integers, then the larger 
of X and y is no greater than any of the expressions 10 x + y, 10 y + x, x + y, or 
X X y. The claim does not hold when zero values are allowed, because x ^ x x 0 is 
false for positive x. It also does not hold when exponentiation is allowed, because 
X ^ y " X is false unless y > 1. It also does not hold when decimal points are allowed 
in expressions. 

Answer 15.7 First of all, the seven ways of extending 2 x 3 H-are 

.12x3 + --- 12x3 + --- 1.2x3 + --- 

.1x2x 3H- 1x2x3... 

.lT2x3-t-*** l-t-2x3-t-*** 

The monotonieity condition fails when decimal points are allowed, so only the naive 
program works. We will just give the modified version of glue, which is 

glue:: Digit —)■ Expr —)■ [Expr] 
g/ner/[] = [[[([r/],[])]],[[([],[r/])]]] 
glue d (((x5,y5) :fs): ts) 

I nullxs = {{{xs,d:ys) :fs): ts): rest 
I nullys = [(([],r/ixi) '.fs ): ts, (([r/],x5 ) '.fs ): -Vrrest 
I otherwise = rest 
where rest = [{{d:xs,ys) :fs): ts, 

(([r/],[]):(x5,y5) :fs):ts, 

(([],[r/]):(x5,y5) :fs):ts, 

It turns out that there are 198 ways to make 100 when decimal points are allowed. 

Answer 15.8 The monotonieity condition fails when exponentiation is allowed, so 
only the naive program works. Here is a modified version of glue: 

glue:: Digit —)■ Expr —)■ [Expr] 

glued[] =[[[[[^]]]]] 

glue d {{{ds :fs) :es):ts) = [(((r/: ds) :fs): es): ts, 

(([r/]: {ds :fs)): es): ts, 

{[[(i]\:{{ds:fs):es)):ts, 

[[[d]W:{{{ds:fs):es):ts)] 

An expression can be exfended on fhe leff in four ways. 
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It turns out that there are just three ways to make a century using exponentiation: 

100= 1 ^23 + 4 + 5x6 + 7x8 + 9 
100= 1 ^2 ^3 + 4 + 5 X 6 + 7x8+ 9 
100=1+2^3 + 4x5 + 6 + 7x8 + 9 

Answer 15.9 We have 
exp X 

= { definition of iterate } 

foldrf e {takewhilep {x: iterate g {g x))) 

= { definition of takeWhile, assuming px} 

foldrf e {x : takeWhilep {iterate g {gx))) 

= { definition of foldr } 

f X {foldrf e {takeWhilep {iterate g {gx)))) 

= { definition of exp } 

f X {exp {gx)) 

On the other hand, exp x = eifpxis false. 

Answer 15.10 The calculation is as follows, assuming the first state in p is not a 
solved state: 

search\ pss {p :ps) 

= { definition of search\ } 

search {p-.ps-\^ concat {reverse pss)) 

= { definition of search, given assumption } 

search {ps +- concat {reverse pss) +- succs p) 

= { definition of concat and reverse } 

search {ps +- concat {reverse {succs p:pss))) 

= { definition of search\ } 

searchi {succs p : pss) ps 

Answer 15.11 Yes, the aim can always be achieved. One way is first to get the 
largest element into its final position and fhen apply fhe same mefhod recursively, 
leaving fhe largesl elemenf untouched. To gel fhe largesl elemenl into ils final 
posilion, firsl posilion 0 one place lo fhe righl of fhe largesl elemenl. For example, 
counting positions from 1, fhe single move 2 converls [4,1,3,0,2] to [4,0,1,3,2], 
while fhe Iwo moves 4 and 3 converl [3,0,1,4,2] to [3,4,0,1,2]. Lelj be fhe 
position of fhe largesl elemenl. Repeatedly apply fhe moves j, j + 2, j + 1, j + 3, 
and so on, followed by a final move n — 1, to shuffle fhe largesl elemenl to fhe 
righl. Continuing fhe example, fhe moves 2, 4, 3, and 5 converl [3,4,0,1,2] to 
[3,1,2,4,0], which, followed by move 4, yields [3,1,2,0,4]. 
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The relevant definitions for a breadth-first seareh are 

start :: [Nat] —>■ State 
start xs= {holex,x) 

where x = list Array (1, length xs) xs 

hole X = head [j\j ^ [I ..],x\j == 0] 

movesState —)■ [Move] 

moves (j,x) = [k[ k [j — IJ — 2J + 1 j + 2],a Ak ^b] 

where {a,b) = bounds x 

move :: State -A Move -A State 

move {j,x)k= {k,x//[(j,x\k),{k,x Ij)]) 

solved:: State -A Bool 
solved (j,x) = sorted (elemsx) 

where sorted xs = and {zipWith (^) xs {tail x^)) 

For example, 

solution {start [4,1,3,0,2]) = Just [3,1,2,4,3,5,4,2,1] 

A total of nine moves sorts the numbers. 

Answer 15.12 A simple representation is to use an array of naturals for states and 
two naturals, the souree and destination jugs, for the moves: 

type State = Array Nat Nat 
type Move = {Nat,Nat) 

The possible moves in a given state consist of a pair of distinct integers in which the 
source jug is nonempty and the target jug is not full: 

moves :: State -A [Move] 

moves t = [(j,k) [j A- indices t,k A- indices t,j ^ k,0<tlj,t\k< cap ! k] 

The puzzle is solved when the target value appears in the array: 

solved :: State -A Bool 
solved t = elem target {elems t) 

Finally, to determine the result of a move, observe that the total quantity of water in 
the two jugs remains the same, and either the source is emptied or the target is filled 
to its capacity. That leads to 

move :: State -A Move -A State 

move x{j,k) =\^t ^c then x / / [{j,Q),{k,t)] else x // [{j,t — c),{k,c)] 
where t = x\j + x\ k\c = cap !k 

The unique solution of length six for the three-jugs problem is 
[(3,2), (2,1), (1,3), (2,1), (3,2), (2,1)] 
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Answer 15.13 Data for the problem is defined by three numbers {m,n,p), where m 
is the total number of elves (the same as the number of dwarves), n is the number of 
dwarves who can row, and p is the maximum number of passengers allowed in a 
boat: 

type Data = {Nat,Nat,Nat) 

One possible definition of a state is a quadruple 
type State = {Bool,Nat,Nat,Nat) 

In a state {b,e,d,r) the boolean b is True if the boat is empty and at the left bank 
and False if it is empty and at the right bank. The values {e,d, r) are the numbers 
of elves, the number of non-rowing dwarves, and the number of dwarves who can 
row, on the left bank of the river, so the corresponding values on the right bank are 
{m — e,m — n — d,n — r). Assuming everyone is initially on the left bank, the initial 
state is 

start :: Data —)■ State 

start {m,n,p) = {True,m,m — n,n) 

The puzzle is solved if nobody is left on the left bank: 

solved :: State —)■ Bool 

solved t = {t == {False,0,0,0)) 

A state is safe if the dwarves never outnumber the elves on either bank. If {e,d,r) 
are the numbers on the left bank, then we require 

{e == 0Ve^d + r)A{m —e == 0 V m — e ^ m — {d + r)) 
which simplifies fo 

safe :: Nat -A State -A Bool 

safe m {b, e, d, r) = {e == 0 V e == mV e == d + r) 

A move consisfs of fhe number of elves, non-rowing dwarves, and dwarves who can 
row, representing the passengers carried on the boat: 

type Move = {Nat,Nat,Nat) 

A move is legal if it contains at most p people, at least one rower, and if the dwarves 
do not outnumber the elves: 

legal :: Nat -V Move -V Bool 

legal p {x,y,z) = x+y + z ^ p A {x ^ \ V 1) A {x == OV xf:y + z) 

The function move is now defined by 
move:: State -A Move -A State 

move {True,e,d,r) {x,y,z) = {False,e—x,d — y,r — z) 
move {False,e,d,r) {x,y,z) = {True,e+x,d + y,r + z) 
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A move consists of the boat travelling from one side of the river to the other, and 
emptying all passengers onto the river bank. The function moves is defined by 

moves:: Data —)■ State —)■ [Move] 
moves {m,n,p) t@{b,e,d,r) 

= [{x,y,z) \x^ [0..j],z^ [0..k], 

legalp {x,y,z) A safe m {move t {x,y,z))] 
where {ij, k) = if b then {e,d, r) else {m — e,m — n — d,n — r) 

For example, the (3,1,2) problem has four solutions, each involving 13 crossings 
in total. 

Answer 15.14 One possibility: 

showMoves w [Move] —)■ [String] 
showMoves = map showMove ■ groupBy sameName 
where sameName mi m 2 = name mi == name m 2 
name = n 

showMove ms = show {name {head ms)) TF 

concatMap dir [( 5 ,/) | {-,s,f) t— ms] 

Answer 15.15 We have, again writing moves in expanded, directional form: 

premovesg OR = [[IDDD]] 
premoves g ID = [[4L]] 
premovesgAL = [[317], [3DD]] 
premoves g 3U = [[2/?]] 
premoves glR = [ [ IDD] ] 

But the move IDD repeats part of IDDD, so this plan is rejected. We are left with a 
single plan, namely 

new plans g gameplan = [3DD,4L, \DDD,0RRR] 

All moves in this plan can be carried out, leading to the solution. 




Chapter 16 


Heuristic search 


In the search methods we have looked at so far, we have always chosen the first 
path on the frontier for expansion, the only difference between hreadth-first and 
depth-first search being the order in which we added newly formed paths to the 
frontier. In heuristic search, we make use of a given estimate of how likely it is 
that each path will lead to a good result, and we choose next the one with the best 
expectation. The hope is that, if the estimate is reasonably accurate, then we will 
find an optimum path more quickly. With heuristic search the frontier is managed as 
a priority queue in which the priorities are estimates of how good a path is. At each 
step the path with the highest priority is chosen for further expansion. Heuristic 
search is useful only when searching for a single solution to a problem. 

The primary example of heuristic search is the problem of finding a route between 
two towns in a network of roads. The cost of getting to the final destination from a 
given town can be estimated as the straight-line distance between the two towns, 
the distance a crow would have to fly. No real route could have a shorter distance, 
so this is an optimistic estimate. The choice of the next partial route for further 
exploration will be one that minimises the sum of the cost of the route so far and 
the estimate of how much further there is to travel. Contrast this with Dijkstra’s 
algorithm, which always explores a partial route whose cost is the minimum so far, 
ignoring any estimate for completing the journey. Route-finding algorithms have 
many applications in artificial intelligence, including robotics, games, and puzzles. 
We will take a look at some examples later on. 

Heuristic search is usually described using the terminology of graphs and edges 
rather than states and moves. We will assume throughout this chapter that graphs 
consist of a finite number of vertices and directed edges, and that the cost (or weight) 
of each edge is always a positive number. We describe two closely related algorithms 
for carrying out heuristic search, each of which depends on a different assumption 
about the estimating function. We revisit the necessary operations of priority queues, 
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and also describe a new structure, a priority search queue, that can help to improve 
the running time of the search. 


16.1 Searching with an optimistic heuristic 

By dehnition, an estimating function or heuristic is a function h from vertices to 
costs such that h{v) estimates the cost of getting from vertex v to a closest goal (in 
general, there may be a number of possible goals rather than one single destination). 
Such a function is said to be optimistic if it never overestimates tbe actual cost. In 
symbols, if H{v) is the minimum cost of any path from vertex v to a closest goal, 
then h{v) ^ H{v) for all vertices v. If there is no path from v to a goal, then h{v) is 
unconstrained. An optimistic heuristic is also called an admissible heuristic. In this 
section we give two algorithms that work whenever the heuristic is optimistic and 
there is a path from the source to a goal. 

The first algorithm is a very basic form of heuristic search, which we will call T* 
search simply because the underlying algorithm is really a tree search. Here are the 
types we need, with the exception of Vertex, which depends on the application: 

type Cost = Nat 

type Graph = Vertex —)■ [ (Vertex, Cost) ] 
type Heuristic = Vertex —)■ Cost 
type Path = ([ Vertex ], Cost) 

We assume that a graph is given not as a list of vertices and edges, but as a function 
from vertices to lists of adjacent vertices together with the associated edge costs. 
This function corresponds to the function moves of the previous chapter, except that 
now we assume each move from one state to another is associated with a certain 
cost. A path is a list of vertices along with the cost of the path. For efficiency, pafhs 
will be consfrucfed in reverse order, so fhe endpoinf of a pafh is fhe firsl elemenf in 
fhe list of vertices: 

end :: Path —)■ Vertex 
end = head -fst 

cost :: Path —)■ Cost 
cost = snd 

extract "Path —)■ Path 
extract {vs,c) = (reverse vs,c) 

In terms of states and moves, a path would be a triple consisting of a list of moves, 
an end state, and the cost of the moves. We will also make use of the following 
operations on priority queues from Section 8.3: 
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insertQ :: Ordp ^ a ^p ^ PQ ap ^ PQ ap 
addListQ :: Ordp ^ [{a,p)] —t PQ ap ^ PQ ap 
deleteQ ::Ordp ^ PQ ap ^ {{a,p),PQ ap) 
emptyQ v.PQap 
nullQ y.PQap^Bool 

To recap: insertQ adds a new value with a given priority to the queue; addListQ adds 
a list of value-priority pairs to an existing queue; deleteQ deletes a value with the 
lowest priority from the queue, returning the value, its priority, and the remaining 
queue; emptyQ is the empty queue; and nullQ is a test for whether the queue is 
empty or not. In what follows we will not need deleteQ to return the priority of a 
value, so we introduce the variant 

removeQ:: Ordp ^ PQ ap ^ {a,PQ ap) 
removeQ q\ = {x,q 2 ) where ((v, _),^ 2 ) = deleteQ q\ 

Here now is the definition of tstar. 

tstar:: Graph —)■ Heuristic —)• {Vertex —)■ Bool) —)■ Vertex —)■ Maybe Path 
tstar g h goal source = tsearch start 

where start = insertQ {[source],0) {h source) emptyQ 
tsearch ps \ nullQ ps = Nothing 

I goal {endp) = Just {extractp) 

I otherwise = tsearch rs 
where {p,qs) = removeQ ps 

rs = addListQ {succs ghp) qs 

As inputs to tstar we have a graph, a heuristic function, a test for whether a vertex 
is a goal or not, and the source vertex. The frontier is maintained as a priority queue 
of paths and their costs, initially containing the single path [source] with cost 0 and 
priority h{source). If the queue is not empty, then a path with the lowest estimate of 
how much it costs to complete the journey is selected. If the selected path ends at a 
goal node, then that path is the result; otherwise its successor paths are added to the 
queue. The subsidiary function succs returns a list of possible successor paths: 

succs:: Graph —)■ Heuristic —)■ Path —)■ [ {Path, Cost) ] 

succs g h {u '.vs ,c) = [{{v. u '.vs ,c + d) ,c + d + h v) [ {v,d) ^ g u] 

Note carefully that the priority of a new path is not simply the estimate of how far 
away the endpoint is from a goal, but the sum of the cost of getting to the endpoint 
and the estimate of the remaining cost. It is left as an exercise to show that taking 
the estimate alone as the priority can lead to a solution that is not the shortest. 

The tstar algorithm is not a very satisfactory one. One fundamental flaw is that it 
is not guaranteed to terminate. For example, consider the graph 
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with source vertex A and an isolated goal vertex C. Instead of terminating with 
Nothing, the function tstar goes into an infinite loop, constructing longer and longer 
paths A, AB, ABA, ABAB, and so on, in a fruitless attempt to find fhe goal. A similar 
phenomenon occurs with the graph 



and the optimistic heuristic h = const 0. Only after 100 oscillations between A and 
B will tstar discover the path ABC with cost 101. Even worse, with the graph 



tstar will oscillate about 2^*^ times between A, B, and D before finding fhe final pafh. 
Hence tstar can be very inefficient. We will remedy both these problems below. 

Nevertheless, provided there is a path from the source to a goal, tstar will find one 
with minimum cost. The only provisos are that h be optimistic and that edge costs 
are positive numbers. Say a path from the source vertex 5 is a good path if it can be 
completed to a path with minimum cost. We show that at each step of tstar there is a 
good path on the frontier and, moreover, some good path will eventually be selected 
for further expansion at a subsequent step. The claim is clearly true initially. For the 
induction step, suppose p is a good path on the frontier, with endpoint v say. Let 
c{p) be the cost of p, so 

c{p)+h{v) ^ c{p)+H{v) = H{s) 

since h is optimistic. Recall that H(v) is the minimum cost of any path from v to a 
goal and s is the starting vertex. Suppose some bad path q, with endpoint u say, is 
chosen at the next step, so 

c{q)+h(u) ^ c{p) +h{v) ^ H{s) 

However, u cannot be a goal state, for otherwise 

c{q) + h{u) = c(^) + 0 > H{s) 

since ^ is a bad path. The final step of the proof is to observe that bad paths cannot 
be added indefinitely to the frontier before a good path is selected for expansion. 
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Figure 16.1 A simple graph 


Let 5 > 0 be the minimum cost of any edge (recall that graphs are finite, so there 
are finitely many edges). A path of length k therefore has cost at least k 5, so no bad 
path of length greater than H{s) /5 can be added to the frontier before a good path 
is selected for expansion. 

Here is an example to show tstar at work. Consider the graph of Figure 16.1, in 
which A is the source vertex and D is the goal. Suppose h is the optimistic heuristic 



A 

B 

C 

D 

h 

9 

1 

5 

0 


The successive queue entries in priority order are as follows (paths are written in 
the normal left-to-right direction): 

A (0 + 9) 

AB(5 + 1), AC (2 + 5) 

AC (2 + 5), ABD(10 + 0) 

ACB(4 + 1), ABD(10 + 0) 

ACBD(9 + 0), ABD(10 + 0) 

The algorithm starts off with the single queue entry A (0 + 9), where the first 
component of the sum is the distance from the source vertex A and the second 
component is the heuristic value. Subsequent paths in the queue are as above. 
Although the non-optimal path ABD is inserted into the queue at the second step, 
it is never selected; instead the final path returned by the algorithm is ABCD with 
cost 9. As this example demonstrates, tstar works best when the underlying graph 
is acyclic. 

The obvious way to remedy the fact that tstar may not terminate is to maintain a 
second argument that records which vertices have already been visited, meaning 
their successors have been added to the queue. That way, no vertex is processed 
more than once. After all, this was exactly what was done in depth-first and breadth- 
first search. However, the idea does not work: it is possible that a second path to the 
same vertex with a smaller cost, and hence a better estimate, may be found later on. 
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so vertices may have to be processed more than once. For example, in Figure 16.1 
the vertex B is visited twice in order to discover the second, shorter path to D. 

One solution to the problem is to maintain a finite map from vertices to path 
costs instead of a set containing the vertices that have already been visited. If a 
new path to a vertex is discovered with a lower cost, then the path can be explored 
further. Otherwise the path can be abandoned. Another solution, involving a stronger 
assumption about the heuristic function h, is left to the following section. 

The finite map can be implemented as a simple association list of vertex-cost 
pairs, but a more efficient alternative is to make use of the Haskell library Data.Map 
and the three operations 

empty :: Ord k ^ Map k a 

lookup:: Ord k^k^ Map ka^ Maybe a 

insert :: Ord k ^ k ^ a ^ Map ka^ Map k a 

The last two operations take logarithmic rather than linear time. To avoid name 
clashes, we use a qualified import: 

import qualified Data.Map as M 

Here now is the definition of a revised search, the algorithm known as A* search: 

astar:: Graph —)■ Heuristic —)■ {Vertex —)■ Bool) —)• Vertex —)■ Maybe Path 
astar g h goal source = asearch M.empty start 

where start = insertQ {[source],0) {h source) emptyQ 
asearch vcmap ps \ nullQ ps = Nothing 

I goal {endp) = Just {extractp) 

I better p vcmap = asearch vcmap qs 
I otherwise = asearch {add p vcmap) rs 
where {p,qs) = removeQps 

rs = addListQ {succs ghp) qs 

better :: Path —)■ M.Map Vertex Cost —)■ Bool 
better (v: V5,c) vcmap = query {M.lookup v vcmap) 

where query Nothing = False 
query {Just c') =c' ^c 

add:: Path —)■ M.Map Vertex Cost —)■ M.Map Vertex Cost 
add (v: V5, c) vcmap = M.insert v c vcmap 

The additional argument vcmap to asearch is a finite map of vertex-cost pairs. The 
test better determines of a path whether or not another path to the same endpoint 
but with a smaller cost has already been found. If so, the path can be abandoned. 
The operation M.lookup looks up a vertex in the finite map, returning Nothing if 
there is no binding, and the associated cost if there is. The function better returns 
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True just in the case that there is such an associated cost and it is no larger than 
the given cost. The function add adds a new vertex-cost pair, or overwrites an old 
binding with the same vertex and the new cost. 

Let us first prove that astar terminates for all inputs. There are a finite number of 
simple paths in a finite graph, paths that do not contain repeated vertices. Because 
the edge weights are positive numbers, no non-simple path can have a smaller 
cost than the corresponding simple path to the same destination. Let M denote the 
maximum cost taken over all simple paths, of which there are a finite number. Then 
any vertex can be processed at most M times during the computation because each 
processing step requires a path with a strictly smaller cost than before. It follows 
that astar terminates after Mn steps at the very most, where n is the number of 
vertices in the graph. If no path from the source to the target has been found, then 
no such path exists, and the algorithm correctly returns Nothing. 

To show that astar terminates with a minimum-cost path from the source to a 
goal, assuming there is one, we can follow the proof of the correctness of tstar. We 
only have to show that at every step there is a good path on the frontier. Say that 
a path p with endpoint v is open if p is on the frontier and there is no entry (v,c) 
recorded by the finite map with c ^ c{p). Otherwise, say that p is closed. Open 
paths are candidates for further expansion, while closed paths are not. 

Let P = [vo,vi,...,v„] be an optimal path from the source vq to a goal v„ and 
let Pj denote the initial segment [vq, vi,..., vy] for 0 <n. We show that at each 

step there is an open path p with endpoint vj for somey and such that c{p) = c{Pj). 
Hence p can be completed to an optimal path. The assertion holds at the very first 
step because Pq is open. Otherwise, let D be the set of vertices v, for which there 
is a closed path q from vq to v; on the frontier with c{q) = c (/*, ). The set D is not 
empty, because it contains vq. Let v; be the vertex with largest index in D and set 
y = / -|- 1. Define p to be the path q followed by the single edge {vi,Vj) with cost c. 
Then p is an open path and 

c{p) = c{q)+c = c{Pi) +c = c{Pj) 

That completes the proof that astar correctly returns an optimal solution. 


16.2 Searching with a monotonic heuristic 

Now we turn to a second solution to the problem with tstar. This time we need to 
assume more about the heuristic function h, namely that it is monotonic. A heuristic 
h is monotonic if h{u) ^ c + h{v) for every edge {u,v,c) of the graph, where c is 
the cost of the edge. Provided h{v) = 0 for every goal vertex v, it is the case that 
a monotonic heuristic is optimistic; we leave the proof as an exercise. We do not 
need a finite map in the case of a monotonic heuristic because, as we will see below. 
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no vertex is proeessed more than onee. It is therefore suffieient to keep a set of the 
processed vertices. A simple list would do, but it is more efficient to use the set 
operations of Section 4.4. Alternatively, we can use the Haskell library Data.Set, 
which contains the operations 

empty "Orda^Seta 
member" Ord a ^ Set a —)■ Bool 

insert :: Ord a ^ Set a —)■ Set a 

The functions member and insert take logarithmic time. To avoid name clashes, we 
use a qualified import: 

import qualified Data.Set as S 

Under the assumption that h is monotonic, the following monotonic search algo¬ 
rithm mstar will find an optimum path to a goal, provided one exists: 

mstarv. Graph —)■ Heuristic —)■ {Vertex — Bool) —)■ Vertex —)■ Maybe Path 
mstar g h goal source = msearch S.empty start 

where start = insertQ {[source],0) {h source) emptyQ 
msearch V5 ps \ nullQ ps = Nothing 

I goal {end p) = Just {extractp) 

I seen {endp) = msearch V5 qs 
I otherwise = msearch {S.insert {endp) v^) rs 

where seen v = S.member v V5 
{p,qs) = removeQ ps 
rs = addListQ {succs ghvs p) qs 

This variation on A* search is the one most like breadth-first or depth-first search, 
in that there is a simple set vs to record vertices that have been visited to ensure that 
no vertex is ever processed more than once. We show below that, once a path p to a 
vertex v has been found, then p has the minimum cost of any path from the source 
to V, so no further paths to v need be considered. The modified definition of succs 
reads 

succs :: Graph —)■ Heuristic —)■ S.Set Vertex —)■ Path —[ {Path, Cost) ] 
succs gh vs p = [extend pv d \ {v,d) g {end p),not {S.member v v^)] 
where extend (v5,c) v d= {{v :vs ,c + d) ,c + d + h v) 

This is more efficient than the previous version because a successor path is never 
added to the frontier if its endpoint has already been processed. 

For the proof that mstar works correctly, suppose path p to vertex v was found 
before another path p' to v. We have to show that c{p) ^ c{p'). Let be the initial 
segment of p' that was on the frontier when p was selected; let q' end at vertex u 
and let r be the continuation of q' that begins at u and constitutes p'. Then 
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cip) 

^ { since p was selected in favour of q' } 

c{q') +h{u) —h{v) 

= {by definition of path costs } 
c{p') — c(r) + h{u) — h{v) 

^ { since h is monotonic and r is a path from n to v } 

c{p') 

The last step makes use of a generalisation of monotonicity, namely that, if r is 
a path from u to v, then h{u) ^ c(r) +/i(v). We leave the proof as an exercise. In 
conclusion, mstar returns an optimal solution if one exists. 

The example of Figure 16.1 shows that mstar can return a non-optimal solution 
if h is optimistic but not monotonic. The heuristic function 



A 

B 

C 

D 

h 

9 

1 

5 

0 


is not monotonic: the edge from C to B has cost 2 but h{C) >2 + h{B). As before, 
the algorithm starts off with the single entry A (0 + 9). In the next step the queue 
has two entries AB (5 + 1), AC (2 + 5). The path with the lowest priority is AB, so 
the next queue is ABD (10 + 0), AC (2 + 5). The next path to be expanded is AC 
and, since B has already been processed, the next queue consists of the single entry 
ABD (10 + 0), which is the final non-opfimal resulf. 

Here is a more elaborate example fhaf illusfrales another aspect of the mstar 
algorithm: 



The source node is A and the single goal is F. Suppose h is the monotonic function 



A 

B 

c 

D 

E 

F 

h 

10 

10 

5 

5 

0 

0 


The first queue has the single entry A (0+10). Vertex A is added to the visited list, 
and the next queue is 

AB(3 + 10), AC (10 + 5), AC (20 + 0), AD (20 + 5) 
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Vertex B is added to the visited list, and the next queue is 

ABC (8+5), AC (10 + 5), ABD(ll+5), AC (20 + 0), ABC (23 + 0), 

AD (20 + 5) 

The queue eontains redundant entries, since there are two paths to each of C, D, and 
C, only one of which in each case will ever be further explored. We will see how to 
deal with redundancy later on. Note also that the revised definition of succs means 
that the additional path ABA is not added to the queue because A has already been 
visited. Vertex C is added to the visited list, and the next queue is 

AC (10 + 5), ABCD(10 + 5), ABD(ll+5), ABCC(18 + 0), AC (20 + 0), 
ABC (23 + 0), AD (20 + 5) 

There are now three paths to each of D and C on the queue, two of which are 
redundant. The path with the lowest priority is AC with cost 10, but this path is 
abandoned since C has been added to the visited list and a better path ABC with 
cost 8 has already been found. Instead, ABCD is selected, D is added to the visited 
list, and the next queue is 

ABD(ll+5), ABCDC(16 + 0), ABCC(18 + 0), AC (20 + 0), 

ABC (23 + 0), AD (20 + 5) 

There are four remaining paths to C on the queue, three of which are redundant. 
The (first) path with the lowest priority is ABD, but this is rejected since D is on 
the visited list. Instead the path ABCDC is chosen, and after one more step the final 
path ABCDCC with cost 17 is returned. 

As can be appreciated from this example, it is possible for many redundant entries 
to be added to the queue. Depending on the connectivity of the graph, that can lead 
to a good deal of unnecessary computation, adding entries to the queue only to 
delete them again at a later stage. 

One solution to this problem is to employ a more refined data structure than a 
priority queue, called a priority search queue (PSQ). In a PSQ there are values and 
priorities but also keys. The idea is that there is at most one value in a PSQ with a 
given key. In the present example, values are paths, along with their costs, and keys 
are the end vertices of paths. The five queue operations, adapted to priority search 
queues, are as follows: 

insertQ :: {Ord k, Ord p) ^ {a ^ k) ^ a ^ p ^ PSQ akp ^ PSQ akp 
addListQ:: {Ord k, Ord p) ^ {a^k) ^ [{a.,p)] —)■ PSQ akp ^ PSQ akp 
deleteQ :: {Ord k, Ord p) ^ {a ^ k) ^ PSQ akp ^ {{a,p),PSQ akp) 
emptyQ '..PSQ akp 
nullQ '.'.PSQ ak p ^ Bool 

The type PSQ has three parameters: values, keys, and priorities. The first three 
functions, insertQ, addListQ, and deleteQ, take an extra argument, a function for 
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Figure 16.2 A warehouse with obstacles 


extracting keys from values. For both astar and mstar the key function is end for 
extracting the endpoint of a path. The function insertQ works as follows: if there 
is no value with the given key in the queue, then the value is added to the queue 
along with its priority. If there is such a value, only the value with the smaller 
priority is kept. The function addListQ takes a key function and a list of value- 
priority pairs, and inserts them into a queue as before. The remaining functions 
are also as before. The three main queue operations each take logarithmic time in 
the size of the queue. The astar and mstar algorithms are unchanged except for 
an additional argument to the queue operations. It is beyond our scope to go into 
the details of how to implement priority search queues, but see the chapter notes 
for references. In fact, the Haskell libraries in the Hackage repository provide a 
number of implementations of PSQs, including PSQueue and psqueues, though the 
functions provided are slightly different in each case. 


16.3 Navigating a warehouse 

We will give two illustrations of A* search, the first of which concerns the problem 
of navigating around a warehouse filled with obstacles. This is the task that would 
face an autonomous vehicle which has to find a path from a given starting point in 
the warehouse to a given destination, taking care to avoid collisions. For example, 
consider the warehouse shown in Figure 16.2, which contains a haphazard collec¬ 
tion of unit-sized boxes. What is wanted is a path from the top-left comer of the 
warehouse to the bottom-right corner. Sections of the path have to be straight lines 
starting at one grid point and ending at another, avoiding any box along the way. 
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Figure 16.3 The continuous line shows an optimal fixed-angle path with cost 
25.07. The dashed line is a variable-angle path with cost 23.64 and the dotted path 
is an optimal variable-angle path with cost 21.48. 


Different solutions are available depending on precisely how the vehicle is al¬ 
lowed to move between grid points. The most restrictive rule is that it can move at 
each step only horizontally or vertically to an adjacent grid point. A more relaxed 
rule would allow the vehicle to move diagonally at a 45 degree angle, so each grid 
point has up to eight rather than four neighbours. In either of these scenarios the 
target of each edge that makes up a path has to be unoccupied by a box. Finally, the 
vehicle may be allowed in one step to move from any one grid point to any other, 
turning through an arbitrary angle to do so. Not only has the target to be unoccupied 
by a box, the step has to avoid touching any line segment that forms part of the 
perimeter of a box. Figure 16.3 shows three such solutions. Firstly, the continuous 
line describes a path that moves only from one grid point to a neighbouring one, 
allowing diagonal moves. The cost of this path is the sum of the costs of the edges, 
where an edge costs 1 for a horizontal or vertical move and \/2 for a diagonal move, 
so distances are Euclidean. The path consists of 18 straight moves and 5 diagonal 
moves, for a total distance of 18 + 5 \pl = 25.01. There are other paths with the 
same minimum cost, but we will leave finding them as an exercise. The other two 
paths are both variable-angle paths obtained by different means. The dotted line 
shows a path in which every point on the grid that is visible to a starting point is 
a possible neighbour. Finally, the dashed line is a path obtained by smoothing the 
fixed-angle path in a manner described below. 

The layout of a warehouse can be described in terms of a grid. The points on a grid 
of size mxn are defined by coordinates (x,y), where I and 1 ^ y ^ n, with 

the four lines x = 0,x = m+\,y = 0, and y = n + \ acting as barriers. Obstacles are 
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made up of unit-size boxes each occupying four grid points. The vertex identifying 
each box is, say, its top-left corner. Hence we define 

type Coord = Nat 

type Vertex = {Coord, Coord) 

type Box = Vertex 

type Grid = {Nat,Nat, [Box]) 

boxes "Grid —)■ [Box] 
boxes , bs) = bs 

The four corners of a box are given by 

corners:: Box —)■ [ Vertex] 

corners{x,y) = [{x,y),{x+l,y),{x+l,y-l),{x,y-l)] 

In the fixed-angle solufion, fhe neighbours of a grid poinf are any of ifs eighf adjacenf 
grid poinfs fbaf are nol boundary poinfs or poinfs occupied by a box. Tbe function 
neighbours can be defined in a number of ways, including by using a fixed array: 

type Graph = Vertex —)■ [Vertex] 

neighbours :: Grid —)■ Graph 

neighbours grid = filter {free grid) -adjacents 

adjacents :: Vertex —)■ [Vertex] 

adjacents {x,y) = [{x— l,y — 1), {x— l,y), {x— l,y + 1), 

{x, y-1), {x, y + 1), 

(v-f l,y-l),(x-bl,y),(x-f l,y-bl)] 

free :: Grid —)■ Vertex —)■ Bool 
free {m,n,bs) = {a\) 

where a = listArray ((0,0), (m -f 1, n -f 1)) {repeat True) 

11 [{{^A):P^lse) I X ^ [0. .m-b l],y ^ [0,n-|- 1]] 

// [{{xA)-,P<^lse) I X [0,m-|- l],y ^ [1 ..n]] 

// [{{x,y). False) ]b ^bs, {x,y) •(— corners b] 

Recall that // is the array update function. A grid point is free if it is not on a 
horizontal or vertical border and not occupied by a box. Use of an array means that 
the neighbours of a grid point can be computed in constant time. 

The fixed-angle pafh can now be computed using fhe funclion^af/i, a modifica- 
fion of mstar of fhe previous section. This lime we will need fhe definilions 

type Dist = Float 

type Path = {[Vertex],Dist) 

end:: Path —)■ Vertex 
end = head -fst 
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extract '.'.Path —)■ Path 
extract {vs,d) = {reverse vs,d) 

We can now define fpath by 

fpathGrid —)■ Vertex —)■ Vertex —)■ Maybe Path 

fpath grid source target = mstar {neighbours grid) source target 

Since the heuristic function is fixed and there is only one goal, the type of mstar 
changes to 

mstar:: Graph —)■ Vertex —)■ Vertex —)■ Maybe Path 

The three arguments to mstar now consist of a graph, a source vertex, and a target 
vertex. We will not write out the modified definition of mstar because the only 
real difference is a new version of succs, which now takes the target vertex as an 
argument instead of the heuristic function: 

succs:: Graph —)■ Vertex —)■ S.Set Vertex —)■ Path —)■ \ {Path,Dist)] 
succs g target visited p = 

[extend pv\v ^ g {end p),not {S.member v visited)] 
where extend {u:vs,d) v = {{v:u: vs,dv),dv + dist v target) 

where dv = d + dist u v 

Definition of the Euclidean distance function dist is left as an exercise. The heuristic 
function is monotonic, so fpath is guaranteed to find a shortest path under the stated 
travel restrictions if such a path exists. 

Computing a variable-angle path involves an additional complication: not only 
does the endpoint of each segment of the path have to be unoccupied, but also 
the segment itself cannot cross any border of any box. Such a crossing may be 
somewhere between two grid points. For example, only the two endpoints of the 
path segment in 



are grid points, but we have to ensure that no border edge of any box crosses the 
segment. A detailed implementation is postponed to the exercises, but suppose we 
have a function 

visible:: Grid —)■ Segment —)■ Bool 
where 

type Segment = {Vertex, Vertex) 

that determines of a grid and a segment whether or not the segment is unimpeded 
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by a box. Then the variable-angle path of Figure 16.3 that arises by smoothing the 
fixed-angle path is computed by 

vpath :: Grid —)■ Vertex —)■ Vertex —)■ Maybe Path 
vpath grid source target = 

mstar {neighbours grid) {visible grid) source target 

where this time mstar has the type 

mstar:: Graph —)■ {Segment —)■ Bool) —)■ Vertex —)■ Vertex —)■ Maybe Path 
and is defined in the same way as before except for a new definition of succs: 

succs g vtest target vs p = 

[extendp w \ w g {endp ), not {S.member w v^) ] 

where extend (v: V5, r/) w = if not {null v^) A vtest {u,w) 

then {{w:vs,du),du + dist w target) 
else ((w: V: V5, dw ), dw -|- dist w target) 
where u = head V5 

du = d — dist UV + dist u w 
dw = d + dist V w 

The extra argument of succs is the visibility test vtest = visible grid. Each time a 
successor w of a vertex v is added to the list we check that the parent u of v, if 
it exists, is visible to w. If it is, then the vertex v is removed from the path, and 
the added edge proceeds directly from u to w. Such a smoothing step will never 
increase the cost of a path and may decrease it. The running time of this algorithm 
is proportional to the fixed-angle pafh version excepf for fhe addifional lime spenl 
evaluating visibilily checks. 

Finally, the optimal variable-angle path of Figure 16.3 can be obtained as an 
instance oifpath that takes the neighbours of a grid point to be all points on the grid 
visible to it: 

neighbours {m,n,bs) (xi,yi) = 

[{X 2 ,y 2 ) [l..m],y 2 ^ [\ . .n],visible {m,n,bs) {{xi,yi),{x 2 ,y 2 ))] 

However, this method is costly and only realistic for small grids. 


16.4 The 8-puzzle 

The 8-puzzle is an example of a class of problems known as sliding-block puzzles. It 
is a smaller version of the famous 15-puzzle popularised by Sam Foyd in the 1880s. 
The puzzle consists of eight tiles arranged in a 3 x 3 grid with one empty space (the 
15-puzzle is identical except there are 15 tiles on a 4 x 4 grid). An example is shown 
in Figure 16.4. Tiles are numbered from 1 to 8, and any tile adjacent to the empty 
space can be moved into it. The aim is to get from some given initial grid to a given 
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Figure 16.4 An initial grid of the 8-puzzle and the required final grid 


final grid, such as the one pictured on the right of the figure. In other sliding-block 
puzzles the tiles can be of different sizes and may be coloured rather than numbered. 
The aim is to slide the pieces around to get to some pleasing final arrangement from 
a given starting point. 

The popularity of the 15-puzzle arose partly because Loyd asked for a solution 
to an impossible problem: he gave an initial grid for which there was no sequence 
of moves that could lead to the final grid. In fact, whatever the final grid happens 
to be, only half the initial grids are solvable. The same holds true for the 8-puzzle. 
The proof turns on three ideas, the first of which is the parity of a permutation. The 
parity of a permutation is the parity of its inversion count, a concept introduced in 
Section 7.2. The inversion count is the number of pairs of elements of a permutation 
p that are out of place, namely when i < j and p{i) >p{j). If we imagine the empty 
space to be a 0, then the final permutation 123456780 of Figure 16.4 has an inversion 
count of 8, while the initial permutation 083256147 has a count of 14. So both 
permutations have even parity. 

Next, a transposition of a permutation is when any two different elements are 
swapped. We claim that any transposition changes the parity of the permutation. To 
see this, consider the transposition (ij), where without loss of generality we suppose 
that i occurs beforej in the list p{l),p{2), ...,p{n). Let s be the segment between i 
and j and let L, be the number of elements of s less than i, and B, the number of 
elements bigger than i; similarly for Lj and Bj. It follows that Li + Bi = Lj + Bj = m, 
where m is the length of s. The inversion count c of the permutation p can now be 
expressed as 

c = Co + L,■ + Bj 

where cq is the contribution to the inversion count of elements outside the segment 
s. The inversion count after the transposition is given by 

c' = co + Ly+B,zhl 

where the final term is positive if i <j and negative if i > j. That means 

c' = {c-Li-Bj)+Lj + Bi±l = c-2Li + 2Lj±\ 
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so c' is odd if c is even and vice versa. This shows in particular that if the initial and 
final permutations are both even, then the first can lead to the second only with an 
even number of moves. 

The third idea involves Manhattan distances. The Manhattan distance of a tile is 
the number of vertical and horizontal moves required to place the tile in its correct 
place in the final grid. For example, fhe Manhattan distance of each of the five files 
1, 2, 4, 7, 8 in the first grid of Figure 16.4 is two, because each tile has to move two 
places to reach its final resting place. The three remaining tiles, tiles 3, 5, and 6, 
each have a Manhattan distance of zero, and the Manhattan distance of the empty 
space is four. Each move in the puzzle corresponds to a transposition of the empty 
space with a tile, and each transposition changes both the parity of the permutation 
and the parity of the Manhattan distance of the empty space. 

It follows that starting with an EE permutation (even parity of permutation and 
even parity of the Manhattan distance of the empty space) can only lead to an 
00 or an EE permutation and never to an OE or EO one. Similarly an OE or EO 
permutation can never lead to a permutation in which the two parities are the same. 
The final sfep in the proof is to observe that the four classes, EE, OE, EO, and 00, 
divide the set of permutations into equal sizes. For the proof, consider the row in 
which the empty space occurs. Transposing the two tiles in the row changes the 
parity of the permutation but not the Manhattan distance of the empty space. This 
produces a bijection between OE and EE and a bijection between EO and 00, so 
they must all be of equal size. 

Tbe next task is to consider wbat heuristic functions might be useful in solving 
the 8-puzzle. At least half a dozen functions have been proposed in the literature, but 
we will discuss just two of the best known. The first is to take h{g) simply to be the 
number of tiles in the grid g that are out of place. As we have seen, in Figure 16.4 
there are five out-of-place tiles, so h{g) = 5. The function h is monotonic (see the 
exercises), so mstar is guaranteed to find a shortesf path. 

The second function is the sum of the Manhattan distances of the tiles from 
their current positions to their final resting places (but not including the Manhattan 
distance of the blank space). For our example we have h{g) = 10 because the 
Manhattan distance of each out-of-place tile is 2. The Manhattan heuristic is a 
refinement of the out-of-place heuristic in that it takes account of how out of place 
each tile is. The Manhattan heuristic is also monotonic. 

The next decision concerns the representation of the grid. An obvious choice is a 
3x3 array, but this representation can be quite wasteful of space, so we will go for 
a more compact one. Tbe idea is to encode a permutation sucb as 083256147 as a 
string of digits, more specifically as an element of Text, a time- and space-efficient 
encoding of Unicode text defined in the Haskell library Data.Text. Since many of 
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the functions imported by this library have names that clash with Standard Prelude 
functions, we import the library as a qualified module: 

import qualified Data.Text as T 
We define a state of the grid by 

type Position = Nat 

type State = {T.Text,Position) 

perm :: State —)■ String 
perm {xs,j) = T.unpack xs 

posnO:: State —)■ Position 
posnO {xs,j) =j 

The position component of a state is the position of the empty space 0 in the 
permutation encoded by the text component, a number between 0 and 8. The 
function unpack unpacks a text into a string. The two states of our running example 
are therefore represented by 

istate ,fstate :: State 

istate = {T.pack "083256147",0) 

/state = {T.pack "123456780",8) 

where pack :: String —)■ Text is another library function in Data.Text. 

Each move can shift the position of the empty space to one of its vertical or 
horizontal neighbours. An efficient definition of moves is given by 

type Move = Nat 

movesState —)■ [Move] 
moves St = moveTable ! {posnO st) 

moveTable :: Array Nat [Nat] 

moveTable = listArray {0,'A)[[\,?i], [0,2,4], [1,5], 

[0,4,6],[1,3,5,7],[2,4,8], 

[3,7], [4,6,8], [5,7]] 

The array moveTable lists the neighbours of grid points explicitly. For example, the 
neighbours of grid point 0 are 1 and 3, while grid point 4 has neighbours 1, 3, 5, 
and 7. The function move can be defined by 

move :: State —)■ Move —)■ State 

move {xs,i) j = {T.replace ty to {T.replace to tx {T.replace tx ty xs)),j) 
where to = T.singleton ' 0' 
ty = T.singleton ' ? ' 
tx = T.singleton {T.index xs j) 

This is a three-sfep process in which the character v at position j in the text is 
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replaced by some new character the empty space by x, and finally ‘?’ by the 
empty space. The functions replace and index are also library functions of the 
module Data.Text, as is singleton, which converts a single character into a text. 

To solve a given instance we should first check whether a solution is possible. We 
will leave it as an exercise to define the two functions 

icparity :: State —)■ Bool 
mhparity :: State —)■ State —Bool 

where icparity returns True if the parity of the inversion count is even, and mhparity 
returns True if the Manhattan distance of the empty space in the initial state to its 
resting place in the final state is even. We can then define 

possible :: State —)■ State —)■ Bool 

possible isfs = {mhparity isfs == {icparity is == icparity fs)) 

That is, a solution is possible if either the Manhattan parity is even and the inversion 
parities agree, or the Manhattan parity is odd and the inversion parities disagree. 
This uses the fact that the Manhattan distance of the empty space in the final state is 
zero and therefore has even parity. 

Here now is the definition of the out-of-place heuristic: 

type Heuristic = State —)■ State —)■ Nat 
h\::Heuristic 

hi isfs = length (filterp {zip {perm is) {permfs))) 
where p {c,d) = c 'O' Ac d 

The two permutations are aligned and the number of out-of-place tiles is counted. 

In order to define the Manhattan heuristic, we will need the coordinates of the 
grid points, which we can take to be 

( 0 , 0 ) ( 0 , 1 ) ( 0 , 2 ) 

( 1 , 0 ) ( 1 , 1 ) ( 1 , 2 ) 

( 2 , 0 ) ( 2 , 1 ) ( 2 , 2 ) 

The coordinates of a state are given by a list of coordinates in tile order, showing 
which coordinate position each tile occupies. For example, with the permutation 
083256147 these coordinates are 

[( 2 , 0 ), ( 1 , 0 ), ( 0 , 2 ), ( 2 , 1 ), ( 1 , 1 ), ( 1 , 2 ), ( 2 , 2 ), ( 0 , 1 )] 

Thus tile 1 occupies position (2,0), tile 2 occupies position (1,0), and so on. If we 
introduce the type synonym 

type Coord = {Nat,Nat) 

then the coordinates in tile order are given by 
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coords:: State —)■ [Coord] 

coords = tail ■ map snd ■ sort ■ addCoords 

where addCoords st = zip {perm st) gridpoints 
gridpoints = map (divModS) [0.. 8] 

Each tile is associated with its coordinate position by addCoords, the result is sorted 
into tile order, and the tiles are discarded. The leading position, the position of the 
empty space, is also dropped. 

Now we can define the Manhattan heuristic by 

/i 2 "Heuristic 

/i 2 isfs = sum {zipWith d {coords is) {coordsfs)) 

where d (xo,yo) {xuSi) =abs {xo-x\) + abs (yo-Ji) 

The mstar algorithm maintains a queue of paths, where a path is defined by 
type Path = ( [Move] , Nat, State) 

key:: Path —)■ State 
key {ms,k,st) = st 

Each pafh records a sequence of moves, fhe lengfh of fhe sequence, and fhe hnal 
sfate af fhe end of fhe moves. The algorifhm makes use of priorify search queues 
and fhe library Data.Set as used in Secfion 16.2: 

mstar::Heuristic —)■ State —)■ State —)■ Maybe [Move] 
mstar h istate/state = 

if possible istate/state then msearch S.empty start else Nothing 
where start = insertQ key {[],0,istate) {h istate/state) emptyQ 
msearch V5 ps [ st ==/state = Just {reverse ms) 

I S.member st V5 = msearch V5 qs 
I otherwise = msearch {S.insert st vi) rs 
where {{ms,k,st),qs) = removeQ key ps 

rs = addListQ key {succs h/state {ms,k,st) vs) qs 

The revised definition of succs is 

succs :: Heuristic —)■ State —Path —)■ S.Set State —)■ [ {Path,Nat) ] 
succs h/state {ms,k,st) V5 

= [ {{m\ms,k+ l,st'),k+ 1 + h st' /state) 

I m ■(— moves st, let st' = move st m, not {S.member sf vi) ] 

Both the out-of-place and the Manhattan heuristics hnd a solution much more 
quickly than breadth-first search, with the Manhattan heuristic proving superior in 
many examples. Here for comparison purposes are typical running times with GHCi 
and numbers of steps, where b/solve computes a solution using a simple breadth-first 
search, in each case returning the same solution [3,6,7,8,5,4,1,0,3,6,7,4,5,8]: 
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time 

moves 

bfsolve 

0.60s 

6450 

ms tar hi 

0.02s 

138 

mstar /i 2 

0.01s 

35 


From the initial state 032871456, there is an even more dramatic improvement in 
computing the solution [1,4,3,6,7,4,1,0,3,4,5,2,1,4,5,8,7,6,3,0,1,4,7,8]: 



time 

moves 

bfsolve 

920.01s 

312963 

mstar hi 

3.37s 

15765 

mstar /i 2 

0.41s 

2032 


16.5 Chapter notes 

The first description of A* search was given in 1968, see [2], as part of a project for 
building a robot that could plan its own actions. A definitive study of the algorithm 
was later given in [1]. A general study of heuristics and how to choose good ones can 
be found in Judea Pearl’s book [6]. Applications of heuristic techniques to problems 
in artificial intelligence can be found in [7], among many other books. Priority 
search queues are described in [3]. The variable-angle path planning algorithms are 
taken from [5]. The 15-puzzle was invented by Noyes P. Chapman, not Sam Loyd, 
and a proof that only a half of the initial positions were solvable was first given in 
1879, see [4]. 
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Exercises 

Exercise 16.1 Is the function H well-defined if edge costs are not constrained to be 
positive? Is H well-defined if edge cosfs are posifive but not necessarily integers? 

Exercise 16.2 Consider the graph 



and the optimistic heuristic h{A) = h{B) = 4 and h{C) = 0. Will tstar find fhe pafh 
ABC? 

Exercise 16.3 Why is Dijksfra’s algorithm a special case of A* search? 

Exercise 16.4 Give a simple graph to show that tstar does not always return a 
shortest path if the priority is just the heuristic estimate for completing the journey. 

Exercise 16.5 The function insert of Data.Map has type 

insert :: Ord k ^ k ^ a ^ Map ka^ Map k a 

What assumption has been made about the use of insert in astar and mstarl 

Exercise 16.6 Is the heuristic that returns the straight-line distance between a town 
and the closest goal a mono tonic heuristic? 

Exercise 16.7 Let the minimum edge cost be c. Is the constant function h{v) =c 
optimistic? 

Exercise 16.8 Show that, if h{v) = 0 for every goal vertex v, then h is optimistic if 
h is monotonic. 

Exercise 16.9 Define/(p) = c{p) +h{v), where v is the endpoint of path p. Show 
that, if p is a prefix of q, then/(p) ^f{q). 

Exercise 16.10 How many other fixed-angle paths with 18 straight moves and 5 
diagonal moves are there in the grid of Figure 16.3? 

Exercise 16.11 Define the function dist used in the warehouse problem. 

Exercise 16.12 Determining whether or not two arbitrary line segments intersect 
is a fundamental task in computational geometry. The full algorithm is a little 
complicated, as a number of different cases have to be considered. However, in the 
warehouse situation the task can be simplified somewhaf, even though a number of 
cases still have to be distinguished. First of all, what does one have to show if the 
segment of the path under construction is horizontal, vertical, or a diagonal slope at 
an angle of 45 degrees? 
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Exercise 16.13 Following on from the previous question, in the remaining case we 
have to check that the endpoint of the segment is free from obstruction, and that no 
border of any box crosses the segment. The borders of the boxes in a grid can be 
defined by 

borders :: Grid —)■ [Segment] 

borders = concatMap {edges ■ corners) ■ boxes 

where edges [u,v^w^x] = [(m,v), (wjv), {x,w),{x,u)] 

However, testing all border segments for whether they cross a given segment s will 
involve border segments that may be far away from s. It is better to filter the borders 
for those that are near s in some suitable sense. What is a suitable definition of nearl 

Exercise 16.14 Following on, the definition of visible now takes the form 
visible :: Grid —)■ Segment —)■ Bool 


visible g s hseg s 

= all (free g) {ypoints s] 

vseg s 

= all (free g) {xpoints s] 

dseg s 

= all (free g) (dpoints s 

eseg s 

= all (free g) (epoints s\ 


I otherwise =free g {snd s) A all {not ■ crosses s) es 
where es = filter {near s) {borders g) 

A segment satisfies hseg if if is horizonfal, vseg if if is vertical, dseg if fhe sum of 
fhe fwo coordinafes of fhe endpoinfs of fhe segmenf is fhe same, so fhe diagonal is 
lefl fo righf, and eseg if fhe difference of fhe coordinafes is fhe same. Wrife suifable 
definifions of fhe remaining funcfions, aparf from crosses. 

Exercise 16.15 If remains fo dehne crosses. For fhis we need fo defermine fhe 
orienfafion of a friangle. The function 

orientation :: Segment —)■ Vertex —)■ Int 
orientation ((xi,yi), (x2,y2)) {^a) 

= signum {{x-xi) x {yi-yi) - {xi-xi) x (y-yi)) 

refurns —I if fhe orienfafion of fhe friangle ABC, where A = (xi,yi), B = {x 2 ,y 2 ), 
and C = (x,y), is anfi-clockwise, +I if ABC is clockwise, and 0 if fhe poinfs A, 
B, and C are collinear. For example, in Figure 16.5 fhe orienfafion of ABC is anti¬ 
clockwise and ABD is clockwise. Thus, if CD is a border of some box, fhen fhe 
segmenf crosses if. On fhe ofher hand, fhe segmenf does nof cross EF even fhough 
ABE and ABF have opposife orienfafions. Why will fhe crossing fesf nof be applied 
fo EF7 Hence define crosses. 
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Figure 16.5 ABC and ABE are anti-clockwise, while ABD and ABB are clockwise 


Exercise 16.16 Consider the 3-puzzle, in which there are three tiles and a blank in a 
2x2 grid. Which of the 24 possible initial states are solvable if the final permutation 
of the tiles is 1230? 

Exercise 16.17 Show that the out-of-place heuristic h for the 8-puzzle is monotonic. 
Show also that this would not be true if we counted whether or not the blank tile 
was out of place. 

Exercise 16.18 Show that the Manhattan heuristic is monotonic, but only because 
the distance of the blank tile is not included in the sum. 

Exercise 16.19 Define the functions icparity and mhparity, where icparity returns 
True if the parity of the inversion count is even, and mhparity returns True if the 
Manhattan distance of the blank tile in the initial state to its resting place in the final 
state is even. 


Answers 

Answer 16.1 Not if the graph has cycles with negative costs. As to the second part, 
yes, but not if the graph is infinite. Imagine an infinite graph with a single source s, 
a single target t, and an infinite number of other vertices v,. There is an edge from s 
to Vi with cost 1 /i and an edge from each v; to t with cost 1. In this case H{s) is not 
well-defined. 

Answer 16.2 No. After two steps the queue contains ABC (5 + 0),AB'A (0 + 4). 
The path ABA is chosen for further expansion, and this leads to an infinite loop. The 
above graph is disallowed if all edge costs are positive, which is why the assumption 
that edge costs are positive is necessary. 

Answer 16.3 Because Dijkstra’s algorithm is the special case of A* search when 
the heuristic function is B = const 0. This function is both optimistic and monotonic 
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(provided edge eosts are positive), so Dijkstra’s algorithm is also a special case of 
M* search. 

Answer 16.4 A graph with three vertices suffices: 



The source vertex is A and the goal vertex is C. We take h{A) = 4, h{B) = 2, and 
h{C) = 0. Tree search will select the path AC with cost 5, but the shortest path is 
ABC with cost 4. 

Answer 16.5 That Ord Vertex. 

Answer 16.6 Yes, in a metric space, monotonicity amounts to the triangle inequality 
being satisfied. 

Answer 16.7 No. If v is a goal, fhen h{v) = c while H{v) = 0. 

Answer 16.8 Lef [v,vi,V 2 , ...,v„] be a shorfesf pafh from v fo a goal v„, wifh edge 
cosfs [ci,C 2 , ...,c„]. If /i is monofonic, fhen 

h{v) ^ C 1 +C 2 H-hc„ + /i(v„) = H{v) 

since /i(v„) = 0. Nofe fhe necessify of h refurning 0 on any goal verfex. 

Answer 16.9 If is sufficienf fo show fhaf fhis holds when p' is p wifhouf ifs lasf 
verfex. Suppose u is fhe endpoinf of p' and fhe edge (m,v) has cosf d. Then 

fip') = c{p') + h{u) ^ c{p')+d + h{v) = f{p) 

by monofonicify. 

Answer 16.10 We counfed 16 pafhs in fofal. 

Answer 16.11 We have 

dist:: Vertex —)■ Vertex —)■ Dist 

dist {x\,y\) (x 2 ,y 2 ) = sqrt (fromintegral {sqr (x 2 —x\) +sqr (y 2 —)'i))) 
where sqr x = xxx 

Answer 16.12 In each of fhese fhree cases fhe segmenf goes only fhrough grid 
poinfs, so we have fo check only fhaf all fhese poinfs are unimpeded by boxes. 
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Answer 16.13 We need only consider borders that lie within the rectangular area 
determined by s. Hence 

near:: Segment —)■ Segment —)■ Bool 

near ((xi,yi), (x2,y2)) {{x3iy3),iM,y4)) = fnin xi X 2 maxx\ X 2 A 

min yi y 2 ^ A y 4 ^ max yi y 2 


Answer 16.14 We can write 

hseg ((xi,yi),(;c 2 ,y 2 )) =xi == X 2 

vseg {{xi,yi),{x2,y2)) = 3^1 == 3^2 
dseg {{xi,yi),{x2,y2)) =xi-\-yi ==X2+y2 
eseg ((xi,yi),(x2,y2)) =xi -yi ==X2-y2 

and also 

ypoints {{xi,yi),{x2,y2)) = [(•^ 1 , 3 ') I 3 ^ [minyiy2..maxyiy2]] 

xpoints ((xi,yi), (x 2 ,y 2 )) = [(-t^jyi) | x A- [minx\ X 2 ■ .maxx\ X 2 ]] 
dpoints ((xi,yi), (x 2 ,y 2 )) = [{x-,x\ +yi —x) \ xA- [minx\ X 2 ..maxx\ X 2 ]] 

epoints {{xi,yi),{x2,y2)) = [(-^i- 3 ^ 1 + 3 ^, 3 ') I 3 ^ [minyiy2. .maxy\y2]] 

Answer 16.15 The segment EF lies outside the rectangle determined by AB and 
is therefore excluded from consideration by the near test. The function crosses is 
defined by 

crosses s (vi, V 2 ) = orientation 5 vi x orientation 5 V 2 ^ 0 

Either the two vertices vi and V 2 straddle the segment s, or one of them lies on the 
segment. 

Answer 16.16 The final slate is of lype EO, so only inifial slales of lype EO or OE 
are solvable. These are Ihe 12 permulalions: 

0123 0231 0312 1023 1203 1230 2031 2301 2310 3012 3102 3120 

Answer 16.17 Eel m be a slate of Ihe 8-puzzle and v be Ihe slate lhal resulls when 
any file t is exchanged wilh Ihe blank tile. Such a move changes Ihe value of h{u) 
by +1 if t was in ils correcl place, by 0 if neilher Ihe sterling poinl nor Ihe endpoinl 
of t is ils correcl place, or — 1 if Ihe endpoinl of t is ils correcl place. In all cases we 
have h{u) ^ 1 + h{v), given lhal Ihe cosl of a move is 1. 

If we counted whelher or nol Ihe emply space was oul of place, Ihen a move 
could reduce Ihe value of h{u) by 2 if bolh t and Ihe emply space were now in Iheir 
correcl places. Bui lhal requires h{u) ^ h{u) — 1, which is impossible. 
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Answer 16.18 A similar argument to the previous question applies. Moving a tile 
can increase or decrease the value of the heuristic by 1 (and by —2 if the empty 
space is included in the sum and both tiles are in their correct places after the swap). 

Answer 16.19 We have 

icparity :: State —)■ Bool 
icparity = even ■ ic -perm 

mhparity :: State —)■ State —)■ Bool 
mhparity isfs = even {dist {posnO is) (posnOfs )) 
where dist i j = abs (xq —x\)+ abs (yo — yi ) 
where (xo,yo) = i divMod 3 
(xi,yi) =7 divMod 3 

The function ic for computing the inversion count was defined in Section 7.2. 
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